MongoDB Cookbook

MongoDB Cookbook

By : Amol Nayak

Buy this Book

MongoDB Cookbook

By: Amol Nayak

Buy this Book

Overview of this book

<p>MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of numerous complex development systems. You will certainly find the MongoDB solution you are searching for in this book.</p> <p>Starting with how to initialize the server in three different modes with various configurations, you will then learn a variety of skills including the basics of advanced query operations and features in MongoDB and monitoring and backup using MMS. From there, you can delve into recipes on cloud deployment, integration with Hadoop, and improving developer productivity. By the end of this book, you will have a clear idea about how to design, develop, and deploy MongoDB.</p>

MongoDB Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Installing and Starting the MongoDB Server

Introduction

Single node installation of MongoDB

Starting a single node instance using command-line options

Single node installation of MongoDB with options from the config file

Connecting to a single node from the Mongo shell with a preloaded JavaScript

Connecting to a single node from a Java client

Starting multiple instances as part of a replica set

Connecting to the replica set from the shell to query and insert data

Connecting to the replica set to query and insert data from a Java client

Starting a simple sharded environment of two shards

Connecting to a shard from the Mongo shell and performing operations

Command-line Operations and Indexes

Creating test data

Performing simple querying, projections, and pagination from the Mongo shell

Updating and deleting data from the shell

Creating an index and viewing plans of queries

Background and foreground index creation from the shell

Creating unique indexes on collection and deleting the existing duplicate data automatically

Creating and understanding sparse indexes

Expiring documents after a fixed interval using the TTL index

Expiring documents at a given time using the TTL index

Programming Language Drivers

Introduction

Installing PyMongo

Executing query and insert operations using PyMongo

Executing update and delete operations using PyMongo

Aggregation in Mongo using PyMongo

MapReduce in Mongo using PyMongo

Executing query and insert operations using a Java client

Executing update and delete operations using a Java client

Aggregation in Mongo using a Java client

MapReduce in Mongo using a Java client

Administration

Renaming a collection

Viewing collection stats

Viewing database stats

Disabling the preallocation of data files

Manually padding a document

Understanding the mongostat and mongotop utilities

Estimating the working set

Viewing and killing the currently executing operations

Using profiler to profile operations

Setting up users in MongoDB

Understanding interprocess security in MongoDB

Modifying collection behavior using the collMod command

Setting up MongoDB as a Windows Service

Configuring a replica set

Stepping down as a primary instance from the replica set

Exploring the local database of a replica set

Understanding and analyzing oplogs

Building tagged replica sets

Configuring the default shard for nonsharded collections

Manually splitting and migrating chunks

Performing domain-driven sharding using tags

Exploring the config database in a sharded setup

Advanced Operations

Introduction

Atomic find and modify operations

Implementing atomic counters in MongoDB

Implementing server-side scripts

Creating and tailing capped collection cursors in MongoDB

Converting a normal collection to a capped collection

Storing binary data in MongoDB

Storing large data in MongoDB using GridFS

Storing data to GridFS from a Java client

Storing data to GridFS from a Python client

Implementing triggers in MongoDB using oplog

Executing flat plane (2D) geospatial queries in Mongo using geospatial indexes

Spherical indexes and GeoJSON-compliant data in MongoDB

Implementing a full-text search in MongoDB

Integrating MongoDB with Elasticsearch for a full-text search

Monitoring and Backups

Introduction

Signing up for MMS and setting up the MMS monitoring agent

Managing users and groups on the MMS console

Monitoring MongoDB instances on MMS

Setting up monitoring alerts on MMS

Backing up and restoring data in Mongo using out-of-the box tools

Configuring the MMS backup service

Managing backups in the MMS backup service

Cloud Deployment on MongoDB

Introduction

Setting up and managing the MongoLab account

Setting up a sandbox MongoDB instance on MongoLab

Performing operations on MongoDB from MongoLab GUI

Setting up MongoDB on Amazon EC2 using the MongoDB AMI

Setting up MongoDB on Amazon EC2 without using the MongoDB AMI

Integration with Hadoop

Introduction

Executing our first sample MapReduce job using the mongo-hadoop connector

Writing our first Hadoop MapReduce job

Running MapReduce jobs on Hadoop using streaming

Running a MapReduce job on Amazon EMR

Open Source and Proprietary Tools

Introduction

Developing using spring-data-mongodb

Accessing MongoDB using Java Persistence API

Accessing MongoDB over REST

Installing the GUI-based client, MongoVUE, for MongoDB

Concepts for Reference

Write concern and its significance

Read preference for querying

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Connecting to a shard from the Mongo shell and performing operations

In this recipe, we will be connecting to a shard from a command prompt; we will also see how to shard a collection and observe the data splitting in action on some test data.

Getting ready

Obviously, we need a sharded mongo server setup that is up and running. See the previous recipe for more details on how to set up a simple shard. The mongos process, as in the previous recipe, should be listening to port number 27017. We have got some names in a JavaScript file called names.js. This file needs to be downloaded from this book's site and kept on the local filesystem. The file contains a variable called names, and the value is an array with some JSON documents as the values, each one representing a person. The contents look as follows:

names = [
  {name:'James Smith', age:30},
  {name:'Robert Johnson', age:22},
…
]

How to do it…

Let's take a look at the steps in detail:

Start the Mongo shell and connect to the default port on the localhost as follows (this will ensure that the names will be available in the current shell):
```
mongo --shell names.js
MongoDB shell version: 2.4.6
connecting to: test
mongos>
```
Switch to the database that will be used to test sharding as follows (we call it shardDB):
```
mongos> use shardDB
```
Enable sharding at the database level as follows:
```
mongos> sh.enableSharding("shardDB")
```

Shard a collection called person as follows:

mongos>sh.shardCollection("shardDB.person", {name: "hashed"}, false)

Add test data to the sharded collection as follows:

mongos> for(i = 1; i <= 300000 ; i++) {
... person = names[Math.round(Math.random() * 100) % 20]
... doc = {_id:i, name:person.name, age:person.age}
... db.person.insert(doc)
 }

Execute the following command to get a query plan and the number of documents on each shard:
```
mongos> db.person.find().explain()
```

How it works…

This recipe demands some explanation. We have downloaded a JavaScript file that defines an array of 20 people. Each element of the array is a JSON object with a name and age attribute. We started the shell that connects to the mongos process loaded with this JavaScript. We then switched to shardDB, which we will use for the purpose of sharding.

For a collection to be sharded, the database in which it will be created needs to be enabled for sharding first. We do this using sh.enableSharding().

The next step is to enable the collection to be sharded. By default, all the data will be kept on one shard and will not be split across different shards. Think about how Mongo will be able to meaningfully split the data. The whole intention is to split it meaningfully and as evenly as possible so that whenever we query based on a shard key, Mongo will easily be able to determine which shard(s) to query. If a query doesn't contain a shard key, the execution of the query will happen on all the shards, and the data will then be collated by the mongos process before returning it to the client. Thus, choosing the right shard key is very crucial.

Let's now see how to shard the collection. We will do this by invoking sh.shardCollection("shardDB.person", {name: "hashed"}, false). There are three parameters here.

The first parameter specifies a fully qualified name of the collection in the <db name>.<collection name> format. This is the first parameter of the shardCollection method.
The second parameter specifies the field name to shard upon in the collection. This is the field that will be used to split the documents on the shards. One of the requirements of a good shard key is that it should have high cardinality (the number of possible values should be high). In our test data, the name value has a very low cardinality and thus, is not a good choice as a shard key. We thus hash this key when using it as a shard key. We do so by mentioning the key as {name: "hashed"}.
The last parameter specifies whether the value used as a shard key is unique or not. The name field is definitely not unique; thus, it will be false. If the field was, say, the person's social security number, it could have been set as true. Also, SSN is a good choice for a shard key due to its high cardinality. Remember though, for the query to be efficient, the shard key has to be present in it.

The last step is to see the execution plan to find all the data. The intent of this operation is to see how the data is being split across two shards. With 3,00,000 documents, we expect something around 1,50,000 documents on each shard. From the explain plan's output, the shard attribute has an array with a document value for each shard in the cluster. In our case. we have two; thus. we have two shards that give the query plan for each shard. In each of them, the value of n is something to look at. It should give us the number of documents that reside on each shard. The following code snippet is the relevant JSON document we see from the console. The number of documents on shards one and two is 164938 and 135062, respectively:

"shards" : {
  "localhost:27000" : [
    {
      "cursor" : "BasicCursor",
      "isMultiKey" : false,
      "n" : 164938,
      "nscannedObjects" : 164938,
      "nscanned" : 164938,
      "nscannedObjectsAllPlans" : 164938,
      "nscannedAllPlans" : 164938,
      "scanAndOrder" : false,
      "indexOnly" : false,
      "nYields" : 1,
      "nChunkSkips" : 0,
      "millis" : 974,
      "indexBounds" : {

      },
      "server" : "Amol-PC:27000"
    }
  ],
  "localhost:27001" : [
    {
      "cursor" : "BasicCursor",
      "isMultiKey" : false,
      "n" : 135062,
      "nscannedObjects" : 135062,
      "nscanned" : 135062,
      "nscannedObjectsAllPlans" : 135062,
      "nscannedAllPlans" : 135062,
      "scanAndOrder" : false,
      "indexOnly" : false,
      "nYields" : 0,
      "nChunkSkips" : 0,
      "millis" : 863,
      "indexBounds" : {

      },
      "server" : "Amol-PC:27001"
      }
    ]
  }

There are a couple of additional things that I recommend you all to do.

Connect to the individual shard from the Mongo shell and execute queries on the person collection. See that the counts in these collections are similar to what we see in the preceding plan. Also, one can find out that no document exists on both the shards at the same time.

We discussed in brief how cardinality affects the way the data is split across shards. Let's do a simple exercise. We will first drop the person collection and execute the shardCollection operation again but, this time, with the {name: 1} shard key instead of {name: "hashed"}. This ensures that the shard key is not hashed and stored as is. Now, load the data using the JavaScript function we used earlier in step 5 and then execute explain on the collection once the data is loaded. Observe how the data is now split (or not) across the shards.

There's more…

A lot of questions might now come up, such as what are the best practices, what are some tips and tricks, how is the sharding thing pulled off by MongoDB behind the scenes in a way transparent to the end user, and so on.

This recipe only explained the basics. All these questions will be answered in Chapter 4, Administration.

MongoDB Cookbook

By : Amol Nayak

MongoDB Cookbook

By: Amol Nayak

Overview of this book

Related Content you might be interested in

Current Title:

MongoDB Cookbook

Connecting to a shard from the Mongo shell and performing operations

Getting ready

How to do it…

How it works…

There's more…