MongoDB Cookbook - Second Edition

MongoDB Cookbook - Second Edition - Second Edition

By : Amol Nayak

Buy this Book

MongoDB Cookbook - Second Edition - Second Edition

By: Amol Nayak

Buy this Book

Overview of this book

MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of the systems that power many different organizations – it’s easy to see why it’s the most popular NoSQL database on the market. Packed with many features that have become essential for many different types of software professionals and incredibly easy to use, this cookbook contains many solutions to the everyday challenges of MongoDB, as well as guidance on effective techniques to extend your skills and capabilities. This book starts with how to initialize the server in three different modes with various configurations. You will then be introduced to programming language drivers in both Java and Python. A new feature in MongoDB 3 is that you can connect to a single node using Python, set to make MongoDB even more popular with anyone working with Python. You will then learn a range of further topics including advanced query operations, monitoring and backup using MMS, as well as some very useful administration recipes including SCRAM-SHA-1 Authentication. Beyond that, you will also find recipes on cloud deployment, including guidance on how to work with Docker containers alongside MongoDB, integrating the database with Hadoop, and tips for improving developer productivity. Created as both an accessible tutorial and an easy to use resource, on hand whenever you need to solve a problem, MongoDB Cookbook will help you handle everything from administration to automation with MongoDB more effectively than ever before.

MongoDB Cookbook Second Edition

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Installing and Starting the Server

Introduction

Installing single node MongoDB

Starting a single node instance using command-line options

Single node installation of MongoDB with options from the config file

Connecting to a single node in the Mongo shell with JavaScript

Connecting to a single node using a Java client

Connecting to a single node using a Python client

Starting multiple instances as part of a replica set

Connecting to the replica set in the shell to query and insert data

Connecting to the replica set to query and insert data from a Java client

Connecting to the replica set to query and insert data using a Python client

Starting a simple sharded environment of two shards

Connecting to a shard in the shell and performing operations

Command-line Operations and Indexes

Introduction

Creating test data

Performing simple querying, projections, and pagination from Mongo shell

Updating and deleting data from the shell

Creating index and viewing plans of queries

Creating a background and foreground index in the shell

Creating and understanding sparse indexes

Expiring documents after a fixed interval using the TTL index

Expiring documents at a given time using the TTL index

Programming Language Drivers

Introduction

Executing query and insert operations with PyMongo

Executing update and delete operations using PyMongo

Implementing aggregation in Mongo using PyMongo

Executing MapReduce in Mongo using PyMongo

Executing query and insert operations using a Java client

Executing update and delete operations using a Java client

Implementing aggregation in Mongo using a Java client

Executing MapReduce in Mongo using a Java client

Administration

Introduction

Renaming a collection

Viewing collection stats

Viewing database stats

Manually padding a document

The mongostat and mongotop utilities

Getting current executing operations and killing them

Using profiler to profile operations

Setting up users in Mongo

Interprocess security in Mongo

Modifying collection behavior using the collMod command

Setting up MongoDB as a windows service

Replica set configurations

Stepping down as primary from the replica set

Exploring the local database of a replica set

Understanding and analyzing oplogs

Building tagged replica sets

Configuring the default shard for non-sharded collections

Manual split and migration of chunks

Domain-driven sharding using tags

Exploring the config database in a sharded setup

Advanced Operations

Introduction

Atomic find and modify operations

Implementing atomic counters in Mongo

Implementing server-side scripts

Creating and tailing a capped collection cursors in MongoDB

Converting a normal collection to a capped collection

Storing binary data in Mongo

Storing large data in Mongo using GridFS

Storing data to GridFS from Java client

Storing data to GridFS from Python client

Implementing triggers in Mongo using oplog

Flat plane 2D geospatial queries in Mongo using geospatial indexes

Spherical indexes and GeoJSON compliant data in Mongo

Implementing full text search in Mongo

Integrating MongoDB for full text search with Elasticsearch

Monitoring and Backups

Introduction

Signing up for MMS and setting up an MMS monitoring agent

Managing users and groups in MMS console

Monitoring instances and setting up alerts on MMS

Setting up monitoring alerts in MMS

Back up and restore data in Mongo using out-of-the-box tools

Configuring MMS Backup service

Managing backups in MMS Backup service

Deploying MongoDB on the Cloud

Introduction

Setting up and managing the MongoLab account

Setting up a sandbox MongoDB instance on MongoLab

Performing operations on MongoDB from MongoLab GUI

Setting up MongoDB on Amazon EC2 manually

Setting up MongoDB using the Docker containers

Integration with Hadoop

Introduction

Executing our first sample MapReduce job using the mongo-hadoop connector

Writing our first Hadoop MapReduce job

Running MapReduce jobs on Hadoop using streaming

Running a MapReduce job on Amazon EMR

Open Source and Proprietary Tools

Introduction

Developing using spring-data-mongodb

Accessing MongoDB using JPA

Accessing MongoDB over REST

Installing a GUI-based client, MongoVUE, for MongoDB

Concepts for Reference

Write concern and its significance

Read preference for querying

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Starting multiple instances as part of a replica set

In this recipe, we will look at starting multiple servers on the same host but as a cluster. Starting a single mongo server is enough for development purposes or non-mission-critical applications. For crucial production deployments, we need the availability to be high, where if one server instance fails, another instance takes over and the data remains available to query, insert, or update. Clustering is an advanced concept and we won't be doing justice by covering this whole concept in one recipe. Here, we will be touching the surface and going into more detail in other recipes in the administration section later in the book. In this recipe, we will start multiple mongo server processes on the same machine for the purpose of testing. In a production environment, they will be running on different machines (or virtual machines) in the same or even different data centers.

Let's see in brief what a replica set exactly is. As the name suggests, it is a set of servers that are replicas of each other in terms of data. Looking at how they are kept in sync with each other and other internals is something we will defer to some later recipes in the administration section, but one thing to remember is that write operations will happen only on one node, which is the primary one. All the querying also happens from the primary by default, though we may permit read operations on secondary instances explicitly. An important fact to remember is that replica sets are not meant to achieve scalability by distributing the read operations across various nodes in a replica set. Its sole objective is to ensure high availability.

Getting ready

Though not a prerequisite, taking a look at the Starting a single node instance using command-line options recipe will definitely make things easier just in case you are not aware of various command-line options and their significance while starting a mongo server. Additionally, the necessary binaries and setups as mentioned in the single server setup must be done before we continue with this recipe. Let's sum up on what we need to do.

We will start three mongod processes (mongo server instances) on our localhost.

We will create three data directories, /data/n1, /data/n2, and /data/n3 for Node1, Node2, and Node3, respectively. Similarly, we will redirect the logs to /logs/n1.log, /logs/n2.log, and /logs/n3.log. The following image will give you an idea on how the cluster would look:

How to do it…

Let's take a look at the steps in detail:

Create the /data/n1, /data/n2, /data/n3, and /logs directories for the data and logs of the three nodes respectively. On the Windows platform, you can choose the c:\data\n1, c:\data\n2, c:\data\n3, and c:\logs\ directories or any other directory of your choice for the data and logs respectively. Ensure that these directories have appropriate write permissions for the mongo server to write the data and logs.

Start the three servers as follows. Users on the Windows platform need to skip the --fork option as it is not supported:

$ mongod --replSet repSetTest --dbpath /data/n1 --logpath /logs/n1.log --port 27000 --smallfiles --oplogSize 128 --fork
$ mongod --replSet repSetTest --dbpath /data/n2 --logpath /logs/n2.log --port 27001 --smallfiles --oplogSize 128 --fork
$ mongod --replSet repSetTest --dbpath /data/n3 --logpath /logs/n3.log --port 27002 --smallfiles --oplogSize 128 –fork

Start the mongo shell and connect to any of the mongo servers running. In this case, we connect to the first one (listening to port 27000). Execute the following command:
```
$ mongo localhost:27000
```
Try to execute an insert operation from the mongo shell after connecting to it:
```
> db.person.insert({name:'Fred', age:35})
```
This operation should fail as the replica set has not been initialized yet. More information can be found in the How it works… section.

The next step is to start configuring the replica set. We start by preparing a JSON configuration in the shell as follows:

cfg = {
  '_id':'repSetTest', 'members':[ {'_id':0, 'host': 'localhost:27000'}, {'_id':1, 'host': 'localhost:27001'}, {'_id':2, 'host': 'localhost:27002'} ]
}

The last step is to initiate the replica set with the preceding configuration as follows:
```
> rs.initiate(cfg)
```
Execute rs.status() after a few seconds on the shell to see the status. In a few seconds, one of them should become a primary and the remaining two should become secondary.

How it works…

We described the common options in the Installing single node MongoDB recipe with the command-line options recipe before and all these command-line options are described in detail.

As we are starting three independent mongod services, we have three dedicated database paths on the filesystem. Similarly, we have three separate log file locations for each of the processes. We then start three mongod processes with the database and log file path specified. As this setup is for test purposes and is started on the same machine, we use the --smallfiles and --oplogSize options. As these processes are running on the same host, we also choose the ports explicitly to avoid port conflicts. The ports that we chose here were 27000, 27001, and 27002. When we start the servers on different hosts, we may or may not choose a separate port. We can very well choose to use the default one whenever possible.

The --fork option demands some explanation. By choosing this option, we start the server as a background process from our operating system's shell and get the control back in the shell where we can then start more such mongod processes or perform other operations. In the absence of the --fork option, we cannot start more than one process per shell and would need to start three mongod processes in three separate shells.

If we take a look at the logs generated in the log directory, we should see the following lines in it:

[rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
[rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done

Though we started three mongod processes with the --replSet option, we still haven't configured them to work with each other as a replica set. This command-line option is just used to tell the server on startup that this process will be running as a part of a replica set. The name of the replica set is the same as the value of this option passed on the command prompt. This also explains why the insert operation executed on one of the nodes failed before the replica set was initialized. In mongo replica sets, there can be only one primary node where all the inserting and querying happens. In the image shown, the N1 node is shown as the primary and listens to port 27000 for client connections. All the other nodes are slave/secondary instances, which sync themselves up with the primary and hence querying too is disabled on them by default. It is only when the primary goes down that one of the secondary takes over and becomes a primary node. However, it is possible to query the secondary for data as we have shown in the image; we will see how to query from a secondary instance in the next recipe.

Well, all that is left now is to configure the replica set by grouping the three processes that we started. This is done by first defining a JSON object as follows:

cfg = {
  '_id':'repSetTest', 'members':[ {'_id':0, 'host': 'localhost:27000'}, {'_id':1, 'host': 'localhost:27001'}, {'_id':2, 'host': 'localhost:27002'} ]
}

There are two fields, _id and members, for the unique ID of the replica set and an array of the hostnames and port numbers of the mongod server processes as part of this replica set, respectively. Using localhost to refer to the host is not a very good idea and is usually discouraged; however, in this case, as we started all the processes on the same machine, we are ok with it. It is preferred that you refer to the hosts by their hostnames even if they are running on localhost. Note that you cannot mix referring to the instances using localhost and hostnames both in the same configuration. It is either the hostname or localhost. To configure the replica set, we then connect to any one of the three running mongod processes; in this case, we connect to the first one and then execute the following from the shell:

> rs.initiate(cfg)

The _id field in the cfg object passed has a value that is the same as the value we gave to the --replSet option on the command prompt when we started the server processes. Not giving the same value would throw the following error:

{
        "ok" : 0,
        "errmsg" : "couldn't initiate : set name does not match the set name host Amol-PC:27000 expects"
}

If all goes well and the initiate call is successful, we should see something similar to the following JSON response on the shell:

{"ok" : 1}

In a few seconds, you should see a different prompt for the shell that we executed this command from. It should now become a primary or secondary. The following is an example of the shell connected to a primary member of the replica set:

repSetTest:PRIMARY>

Executing rs.status() should give us some stats on the replica set's status, which we will explore in depth in a recipe later in the book in the administration section. For now, the stateStr field is important and contains the PRIMARY, SECONDARY, and other texts.

There's more…

Look at the Connecting to the replica set in the shell to query and insert data recipe to perform more operations from the shell after connecting to a replica set. Replication isn't as simple as we saw here. See the administration section for more advanced recipes on replication.

MongoDB Cookbook - Second Edition - Second Edition

By : Amol Nayak

MongoDB Cookbook - Second Edition - Second Edition

By: Amol Nayak

Overview of this book

Related Content you might be interested in

Current Title:

MongoDB Cookbook - Second Edition - Second Edition

Starting multiple instances as part of a replica set

Getting ready

How to do it…

How it works…

There's more…

See also