MongoDB Cookbook - Second Edition

MongoDB Cookbook - Second Edition - Second Edition

By : Amol Nayak

Buy this Book

MongoDB Cookbook - Second Edition - Second Edition

By: Amol Nayak

Buy this Book

Overview of this book

MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of the systems that power many different organizations – it’s easy to see why it’s the most popular NoSQL database on the market. Packed with many features that have become essential for many different types of software professionals and incredibly easy to use, this cookbook contains many solutions to the everyday challenges of MongoDB, as well as guidance on effective techniques to extend your skills and capabilities. This book starts with how to initialize the server in three different modes with various configurations. You will then be introduced to programming language drivers in both Java and Python. A new feature in MongoDB 3 is that you can connect to a single node using Python, set to make MongoDB even more popular with anyone working with Python. You will then learn a range of further topics including advanced query operations, monitoring and backup using MMS, as well as some very useful administration recipes including SCRAM-SHA-1 Authentication. Beyond that, you will also find recipes on cloud deployment, including guidance on how to work with Docker containers alongside MongoDB, integrating the database with Hadoop, and tips for improving developer productivity. Created as both an accessible tutorial and an easy to use resource, on hand whenever you need to solve a problem, MongoDB Cookbook will help you handle everything from administration to automation with MongoDB more effectively than ever before.

MongoDB Cookbook Second Edition

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Installing and Starting the Server

Introduction

Installing single node MongoDB

Starting a single node instance using command-line options

Single node installation of MongoDB with options from the config file

Connecting to a single node in the Mongo shell with JavaScript

Connecting to a single node using a Java client

Connecting to a single node using a Python client

Starting multiple instances as part of a replica set

Connecting to the replica set in the shell to query and insert data

Connecting to the replica set to query and insert data from a Java client

Connecting to the replica set to query and insert data using a Python client

Starting a simple sharded environment of two shards

Connecting to a shard in the shell and performing operations

Command-line Operations and Indexes

Introduction

Creating test data

Performing simple querying, projections, and pagination from Mongo shell

Updating and deleting data from the shell

Creating index and viewing plans of queries

Creating a background and foreground index in the shell

Creating and understanding sparse indexes

Expiring documents after a fixed interval using the TTL index

Expiring documents at a given time using the TTL index

Programming Language Drivers

Introduction

Executing query and insert operations with PyMongo

Executing update and delete operations using PyMongo

Implementing aggregation in Mongo using PyMongo

Executing MapReduce in Mongo using PyMongo

Executing query and insert operations using a Java client

Executing update and delete operations using a Java client

Implementing aggregation in Mongo using a Java client

Executing MapReduce in Mongo using a Java client

Administration

Introduction

Renaming a collection

Viewing collection stats

Viewing database stats

Manually padding a document

The mongostat and mongotop utilities

Getting current executing operations and killing them

Using profiler to profile operations

Setting up users in Mongo

Interprocess security in Mongo

Modifying collection behavior using the collMod command

Setting up MongoDB as a windows service

Replica set configurations

Stepping down as primary from the replica set

Exploring the local database of a replica set

Understanding and analyzing oplogs

Building tagged replica sets

Configuring the default shard for non-sharded collections

Manual split and migration of chunks

Domain-driven sharding using tags

Exploring the config database in a sharded setup

Advanced Operations

Introduction

Atomic find and modify operations

Implementing atomic counters in Mongo

Implementing server-side scripts

Creating and tailing a capped collection cursors in MongoDB

Converting a normal collection to a capped collection

Storing binary data in Mongo

Storing large data in Mongo using GridFS

Storing data to GridFS from Java client

Storing data to GridFS from Python client

Implementing triggers in Mongo using oplog

Flat plane 2D geospatial queries in Mongo using geospatial indexes

Spherical indexes and GeoJSON compliant data in Mongo

Implementing full text search in Mongo

Integrating MongoDB for full text search with Elasticsearch

Monitoring and Backups

Introduction

Signing up for MMS and setting up an MMS monitoring agent

Managing users and groups in MMS console

Monitoring instances and setting up alerts on MMS

Setting up monitoring alerts in MMS

Back up and restore data in Mongo using out-of-the-box tools

Configuring MMS Backup service

Managing backups in MMS Backup service

Deploying MongoDB on the Cloud

Introduction

Setting up and managing the MongoLab account

Setting up a sandbox MongoDB instance on MongoLab

Performing operations on MongoDB from MongoLab GUI

Setting up MongoDB on Amazon EC2 manually

Setting up MongoDB using the Docker containers

Integration with Hadoop

Introduction

Executing our first sample MapReduce job using the mongo-hadoop connector

Writing our first Hadoop MapReduce job

Running MapReduce jobs on Hadoop using streaming

Running a MapReduce job on Amazon EMR

Open Source and Proprietary Tools

Introduction

Developing using spring-data-mongodb

Accessing MongoDB using JPA

Accessing MongoDB over REST

Installing a GUI-based client, MongoVUE, for MongoDB

Concepts for Reference

Write concern and its significance

Read preference for querying

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Connecting to a shard in the shell and performing operations

In this recipe, we will connect to a shard from a command prompt, see how to shard a collection, and observe the data splitting in action on some test data.

Getting ready

Obviously, we need a sharded mongo server setup up and running. See the previous recipe, Starting a simple sharded environment of two shards, for more details on how to set up a simple shard. The mongos process, as in the previous recipe, should be listening to port number 27017. We have got some names in a JavaScript file called names.js. This file needs to be downloaded from the Packt website and kept on the local filesystem. The file contains a variable called names and the value is an array with some JSON documents as the values, each one representing a person. The contents look as follows:

names = [
  {name:'James Smith', age:30},
  {name:'Robert Johnson', age:22},
…
]

How to do it…

Start the mongo shell and connect to the default port on localhost as follows. This will ensure that the names will be available in the current shell:
```
mongo --shell names.js
MongoDB shell version: 3.0.2
connecting to: test
mongos>
```
Switch to the database that would be used to test the sharding; we call it shardDB:
```
mongos> use shardDB
```
Enable sharding at the database level as follows:
```
mongos> sh.enableSharding("shardDB")
```

Shard a collection called person as follows:

mongos>sh.shardCollection("shardDB.person", {name: "hashed"}, false)

Add the test data to the sharded collection:

mongos> for(i = 1; i <= 300000 ; i++) {
... person = names[Math.round(Math.random() * 100) % 20]
... doc = {_id:i, name:person.name, age:person.age}
... db.person.insert(doc)
}

Execute the following to get a query plan and the number of documents on each shard:
```
mongos> db.person.getShardDistribution()
```

How it works…

This recipe demands some explanation. We downloaded a JavaScript file that defines an array of 20 people. Each element of the array is a JSON object with the name and age attributes. We start the shell connecting to the mongos process loaded with this JavaScript file. We then switch to shardDB, which we use for the purpose of sharding.

For a collection to be sharded, the database in which it will be created needs to be enabled for the sharding first. We do this using sh.enableSharding().

The next step is to enable the collection to be sharded. By default, all the data will be kept on one shard and not split across different shards. Think about it; how will Mongo be able to split the data meaningfully? The whole intention is to split it meaningfully and as evenly as possible so that whenever we query based on the shard key, Mongo would easily be able to determine which shard(s) to query. If a query doesn't contain the shard key, the execution of the query will happen on all the shards and the data would then be collated by the mongos process before returning it to the client. Thus, choosing the right shard key is very crucial.

Let's now see how to shard the collection. We do this by invoking sh.shardCollection("shardDB.person", {name: "hashed"}, false). There are three parameters here:

The fully qualified name of the collection in the <db name>.<collection name> format is the first parameter of the shardCollection method.
The second parameter is the field name to shard on in the collection. This is the field that would be used to split the documents on the shards. One of the requirements of a good shard key is that it should have high cardinality. (The number of possible values should be high.) In our test data, the name value has very low cardinality and thus is not a good choice as a shard key. We hash this key when using this as a shard key. We do so by mentioning the key as {name: "hashed"}.
The last parameter specifies whether the value used as the shard key is unique or not. The name field is definitely not unique and thus it will be false. If the field was, say, the person's social security number, it could have been set as true. Additionally, SSN is a good choice for a shard key due to its high cardinality. Remember that the shard key has to be present for the query to be efficient.

The last step is to see the execution plan for the finding of all the data. The intent of this operation is to see how the data is being split across two shards. With 300,000 documents, we expect something around 150,000 documents on each shard. However, from the distribution statistics, we can observe that shard0000 has 1,49,715 documents whereas shard0001 has 150285:

Shard shard0000 at localhost:27000
 data : 15.99MiB docs : 149715 chunks : 2
 estimated data per chunk : 7.99MiB
 estimated docs per chunk : 74857

Shard shard0001 at localhost:27001
 data : 16.05MiB docs : 150285 chunks : 2
 estimated data per chunk : 8.02MiB
 estimated docs per chunk : 75142

Totals
 data : 32.04MiB docs : 300000 chunks : 4
 Shard shard0000 contains 49.9% data, 49.9% docs in cluster, avg obj size on shard : 112B
 Shard shard0001 contains 50.09% data, 50.09% docs in cluster, avg obj size on shard : 112B

There are a couple of additional suggestions that I would recommend you to do.

Connect to the individual shard from the mongo shell and execute queries on the person collection. See that the counts in these collections are similar to what we see in the preceding plan. Additionally, one can find out that no document exists on both the shards at the same time.

We discussed in brief about how cardinality affects the way the data is split across shards. Let's do a simple exercise. We first drop the person collection and execute the shardCollection operation again but, this time, with the {name: 1} shard key instead of {name: "hashed"}. This ensures that the shard key is not hashed and stored as is. Now, load the data using the JavaScript function we used earlier in step number 5, and then execute the explain() command on the collection once the data is loaded. Observe how the data is now split (or not) across the shards.

There's more…

A lot of questions must now be coming up such as what are the best practices? What are some tips and tricks? How is the sharding thing pulled off by MongoDB behind the scenes in a way that is transparent to the end user?

This recipe here only explained the basics. In the administration section, all such questions will be answered.

MongoDB Cookbook - Second Edition - Second Edition

By : Amol Nayak

MongoDB Cookbook - Second Edition - Second Edition

By: Amol Nayak

Overview of this book

Related Content you might be interested in

Current Title:

MongoDB Cookbook - Second Edition - Second Edition

Connecting to a shard in the shell and performing operations

Getting ready

How to do it…

How it works…

There's more…