Book Image

MongoDB Cookbook - Second Edition - Second Edition

By : Amol Nayak
Book Image

MongoDB Cookbook - Second Edition - Second Edition

By: Amol Nayak

Overview of this book

MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of the systems that power many different organizations – it’s easy to see why it’s the most popular NoSQL database on the market. Packed with many features that have become essential for many different types of software professionals and incredibly easy to use, this cookbook contains many solutions to the everyday challenges of MongoDB, as well as guidance on effective techniques to extend your skills and capabilities. This book starts with how to initialize the server in three different modes with various configurations. You will then be introduced to programming language drivers in both Java and Python. A new feature in MongoDB 3 is that you can connect to a single node using Python, set to make MongoDB even more popular with anyone working with Python. You will then learn a range of further topics including advanced query operations, monitoring and backup using MMS, as well as some very useful administration recipes including SCRAM-SHA-1 Authentication. Beyond that, you will also find recipes on cloud deployment, including guidance on how to work with Docker containers alongside MongoDB, integrating the database with Hadoop, and tips for improving developer productivity. Created as both an accessible tutorial and an easy to use resource, on hand whenever you need to solve a problem, MongoDB Cookbook will help you handle everything from administration to automation with MongoDB more effectively than ever before.
Table of Contents (17 chapters)
MongoDB Cookbook Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Running a MapReduce job on Amazon EMR


This recipe involves running the MapReduce job on the cloud using AWS. You will need an AWS account in order to proceed. Register with AWS at http://aws.amazon.com/. We will see how to run a MapReduce job on the cloud using Amazon Elastic Map Reduce (Amazon EMR). Amazon EMR is a managed MapReduce service provided by Amazon on the cloud. Refer to https://aws.amazon.com/elasticmapreduce/ for more details. Amazon EMR consumes data, binaries/JARs, and so on from AWS S3 bucket, processes them and writes the results back to S3 bucket. Amazon Simple Storage Service (Amazon S3) is another service by AWS for data storage on the cloud. Refer to http://aws.amazon.com/s3/ for more details on Amazon S3. Though we will use the mongo-hadoop connector, an interesting fact is that we won't require a MongoDB instance to be up and running. We will use the MongoDB data dump stored in an S3 bucket for our data analysis. The MapReduce program will run on the input BSON dump...