Book Image

MongoDB Cookbook

By : Amol Nayak
Book Image

MongoDB Cookbook

By: Amol Nayak

Overview of this book

<p>MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of numerous complex development systems. You will certainly find the MongoDB solution you are searching for in this book.</p> <p>Starting with how to initialize the server in three different modes with various configurations, you will then learn a variety of skills including the basics of advanced query operations and features in MongoDB and monitoring and backup using MMS. From there, you can delve into recipes on cloud deployment, integration with Hadoop, and improving developer productivity. By the end of this book, you will have a clear idea about how to design, develop, and deploy MongoDB.</p>
Table of Contents (17 chapters)
MongoDB Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Write concern and its significance


Write concern is the minimum guarantee that the MongoDB server provides with respect to the write operation done by the client. There are various levels of write concern that are set by the client application, to get a guarantee from the server that a certain stage will be reached in the write process on the server side.

The stronger the requirement for a guarantee, the greater the time taken (potentially) to get a response from the server. With write concern, we don't always need to get an acknowledgement from the server about the write operation being completely successful. For some less crucial data such as logs, we might be more interested in sending more writes per second over a connection. On the other hand, when we are looking to update sensitive information, such as customer details, we want to be sure of the write being successful (consistent and durable); data integrity is crucial and takes precedence over the speed of the writes.

An extremely useful feature of write concern is the ability to compromise between one of the factors: the speed of write operations and the consistency of the data written, on a case-to-case basis. However, it needs a deep understanding of the implications of setting up a particular write concern. The following diagram runs from the left and goes to the right, and shows the increasing level of write guarantees:

As we move from I to IV, the guarantee for the performed write gets stronger and stronger, but the time taken to execute the write operation is longer from a client's perspective. All write concerns are expressed here as JSON objects, using three different keys, namely, w, j, and fsync. Additionally, another key called wtimeout is used to provide timeout values for the write operation. Let's see the three keys in detail:

  • w: This is used to indicate whether to wait for the server's acknowledgement or not, whether to report write errors due to data issues or not, and about the data being replicated to secondary. Its value is usually a number and a special case where the value can be majority, which we will see later.

  • j: This is related to journaling and its value can be a Boolean (true/false or 1/0).

  • fsync: This is a Boolean value and is related to whether the write should wait till the data is flushed to disk or not before responding.

  • wtimeout: This specifies the timeout for write operations, whereby the driver throws an exception to the client if the server doesn't respond back in seconds within the provided time. We will see the option in some detail soon.

In part I, which we have demarcated till driver, we have two write concerns, namely, {w:-1} and {w:0}. Both these write concerns are common, in a sense that they neither wait for the server's acknowledgement upon receiving the write operation, nor do they report any exception on the server side caused by unique index violation. The client will get an ok response and will discover the write failure only when they query the database at some later point of time and find the data missing. The difference is in the way both these respond on the network error. When we set {w:-1}, the operation doesn't fail and a write response is received by the user. However, it will contain a response stating that a network error prevented the write operation from succeeding and no retries for write must be attempted. On the other hand, with {w:0}, if a network error occurs, the driver might choose to retry the operation and throw an exception to the client if the write fails due to network error. Both these write concerns give a quick response back to the invoking client at the cost of data consistency. These write concerns are ok for use cases such as logging, where occasional log write misses are fine. In older versions of MongoDB, {w:0} was the default write concern if none was mentioned by the invoking client. At the time of writing this book, this has changed to {w:1} by default and the option {w:0} is deprecated.

In part II of the diagram, which falls between the driver and the server, the write concern we are talking about is {w:1}. The driver waits for an acknowledgement from the server for the write operation to complete. Note that the server responding doesn't mean that the write operation was made durable. It means that the change just got updated into the memory, all the constraints were checked, and any exception will be reported to the client, unlike the previous two write concerns we saw. This is a relatively safe write concern mode, which will be fast, but there is still a slim chance of the data being lost if it crashes in those few milliseconds when the data was written to the journal from the memory. For most use cases, this is a good option to set. Hence, this is the default write concern mode.

Moving on, we come to part III of the diagram, which is from the entry point into the server as far as the journal. The write concern we are looking for here is at {j:1} or {j:true}. This write concern ensures a response to the invoking client only when the write operation is written to the journal. What is a journal though? This is something that we saw in depth in Chapter 4, Administration, but for now, we will just look at a mechanism that ensures that the writes are made durable and the data on the disk doesn't get corrupted in the event of server crashes.

Finally, let's come to part IV of the diagram; the write concern we are talking about is {fsync:true}. This requires that the data be flushed to disk to get before sending the response back to the client. In my opinion, when journaling is enabled, this operation doesn't really add any value, as journaling ensures data persistence even on server crash. Only when journaling is disabled does this option ensure that the write operation is successful when the client receives a success response. If the data is really important, journaling should never be disabled in the first place as it also ensures that the data on the disk doesn't get corrupted.

We have seen some basic write concerns for a single-node server or those relevant to the primary node only in a replica set.

Note

An interesting thing to discuss is, what if we have a write concern such as {w:0, j:true}? We do not wait for the server's acknowledgement and also ensure that the write has been made to the journal. In this case, journaling flag takes precedence and the client waits for the acknowledgement of the write operation. One should avoid setting such ambiguous write concerns to avoid unpleasant surprises.

We will now talk about write concern when it involves secondary nodes of a replica set as well. Let's take a look at the following diagram:

Any write concern with a w value greater than one indicates that secondary nodes too need to acknowledge before sending a response back. As seen in the preceding diagram, when a primary node gets a write operation, it propagates that operation to all secondary nodes. As soon as it gets a response from a predetermined number of secondary nodes, it acknowledges the client that the write has been successful. For example, when we have a write concern {w:3}, it means that the client should be sent a response only when three nodes in the cluster acknowledge the write. These three nodes include the primary node. Hence, it is now down to two secondary nodes to respond back for a successful write operation.

However, there is a problem with providing a number for the write concern. We need to know the number of nodes in the cluster and accordingly set the value of w. A low value will send an acknowledgement to a few nodes replicating the data. A value too high may unnecessarily slow the response back to the client, or in some cases, might not send a response at all. Suppose you have a three-node replica set and we have {w:4} as the write concern, the server will not send an acknowledgement till the data is replicated to three secondary nodes, which do not exist as we have just two secondary nodes. Thus, the client waits for a very long time to hear from the server about the write operation. There are a couple of ways to address this problem:

  • Use the wtimeout key and specify the timeout for the write concern. This will ensure that a write operation will not block for longer than the time specified (in milliseconds) for the wtimeout field of the write concern. For example, {w:3, wtimeout:10000} ensures that the write operation will not block more than 10 seconds (10,000 ms), after which an exception will be thrown to the client. In the case of Java, a WriteConcernException will be thrown with the root cause message stating the reason as timeout. Note that this exception does not rollback the write operation. It just informs the client that the operation did not get completed in the specified amount of time. It might later be completed on the server side, some time after the client receives the timeout exception. It is up to the application program to deal with the exception and programmatically take the corrective steps. The message for the timeout exception does convey some interesting details, which we will see on executing the test program for the write concern.

  • A better way to specify the value of w, in the case of replica sets, is by specifying the value as majority. This write concern automatically identifies the number of nodes in a replica set and sends an acknowledgement back to the client when the data is replicated to a majority of nodes. For example, if the write concern is {w:"majority"} and the number of nodes in a replica set is three, then majority will be 2. Whereas, at the later point in time, when we change the number of nodes to five, the majority will be 3 nodes. The number of nodes to form a majority automatically gets computed when the write concern's value is given as majority.

Now, let us put the concepts we discussed into use and execute a test program that will demonstrate some of the concepts we just saw.

Setting up a replica set

To set up a replica set, you should know how to start the basic replica set with three nodes. Refer to the Starting multiple instances as part of a replica set recipe in Chapter 1, Installing and Starting the MongoDB Server. This recipe is built on that recipe because it needs an additional configuration while starting the replica set, which we will discuss in the next section. Note that the replica used here has a slight change in configuration to the one you have used earlier.

Here, we will use a Java program to demonstrate various write concerns and their behavior. The Connecting to a single node from a Java client recipe in Chapter 1, Installing and Starting the MongoDB Server, should be visited until Maven is set up. This can be a bit inconvenient if you are coming from a non-Java background.

Note

The Java project named Mongo Java is available for download at the book's website. If the setup is complete, you can test the project just by executing the following command:

mvn compile exec:java -Dexec.mainClass=com.packtpub.mongo.cookbook.FirstMongoClient

The code for this project is available for download at the book's website. Download the project named WriteConcernTest and keep it on a local drive ready for execution.

So, let's get started:

  1. Prepare the following configuration file for the replica set. This is identical to the config file that we saw in the Starting multiple instances as part of a replica set recipe in Chapter 1, Installing and Starting the MongoDB Server, where we set up the replica set, as follows, with just one difference, slaveDelay:5, priority:0:

    cfg = {
       _id:'repSetTest',
       members:[
           {_id:0, host:'localhost:27000'},
           {_id:1, host:'localhost:27001'},
           {_id:2, host:'localhost:27002', slaveDelay:5, priority:0}
       ]
    }
    
  2. Use this config to start a three-node replica set, with one node listening to port 27000. The others can be any ports of your choice, but stick to 27001 and 27002 if possible (we need to update the config accordingly if we decide to use a different port number). Also, remember to set the name of the replica set as replSetTest for the replSet command-line option while starting the replica set. Give some time to the replica set to come up before going ahead with next step.

  3. At this point, the replica set with the earlier mentioned specifications should be up and running. We will now execute the test code provided in Java, to observe some interesting facts and behaviors of different write concerns. Note that this program also tries to connect to a port where no Mongo process is listening for connections. The port chosen is 20000; ensure that before running the code, no server is up and running and listening to port 20000.

  4. Go to the root directory of the WriteConcernTest project and execute the following command:

    mvn compile exec:java -Dexec.mainClass=com.packtpub.mongo.cookbook.WriteConcernTests
    

    It should take some time to execute completely, depending on your hardware configuration. Roughly around 35 to 40 seconds were taken on my machine, which has a spinning disk drive with a 7200 RPM.

Before we continue analyzing the logs, let us see what those two additional fields added to the config file to set up the replica were. The slaveDelay field indicates that the particular slave (the one listening on port 27002 in this case) will lag behind the primary by 5 seconds. That is, the data being replicated currently on this replica node will be the one that was added on to the primary 5 seconds ago. Secondly, this node can never be a primary and hence, the priority field has to be added with the value 0. We have already seen this in detail in Chapter 4, Administration.

Let us now analyze the output from the preceding command's execution. The Java class provided need not be looked at here; the output on the console is sufficient. Some of the relevant portions of the output console are as follows:

[INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ mongo-cookbook-wctest ---
Trying to connect to server running on port 20000
Trying to write data in the collection with write concern {w:-1}
Error returned in the WriteResult is NETWORK ERROR
Trying to write data in the collection with write concern {w:0}
Caught MongoException.Network trying to write to collection, message is Write operation to server localhost/127.0.0.1:20000 failed on database test
Connected to replica set with one node listening on port 27000 locally

Inserting duplicate keys with {w:0}
No exception caught while inserting data with duplicate _id
Now inserting the same data with {w:1}
Caught Duplicate Exception, exception message is { "serverUsed" : "localhost/127.0.0.1:27000" , "err" : "E11000 duplicate key error index: test.writeConcernTest.$_id_  dup key: { : \"a\" }" , "code" : 11000 , "n" : 0 , "lastOp" : { "$ts" :1386009990 , "$inc" : 2} , "connectionId" : 157 , "ok" : 1.0}
Average running time with WriteConcern {w:1, fsync:false, j:false} is 0 ms
Average running time with WriteConcern {w:2, fsync:false, j:false} is 12 ms
Average running time with WriteConcern {w:1, fsync:false, j:true} is 40 ms
Average running time with WriteConcern {w:1, fsync:true, j:false} is 44 ms
Average running time with WriteConcern {w:3, fsync:false, j:false} is 5128 ms
Caught WriteConcern exception for {w:5}, with following message { "serverUsed" : "localhost/127.0.0.1:27000" , "n" : 0 , "lastOp" : { "$ts" : 1386009991 , "$inc" : 18} , "connectionId" : 157 , "wtimeout" : true , "waited" : 1004 , "writtenTo" : [ { "_id" : 0 , "host" : "localhost:27000"} , { "_id" : 1 , "host" : "localhost:27001"}] , "err" : "timeout" , "ok" : 1.0}
 [INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36.671s
[INFO] Finished at: Tue Dec 03 00:16:57 IST 2013
[INFO] Final Memory: 13M/33M
[INFO] ------------------------------------------------------------------------

The first statement in the log states that we try to connect to a Mongo process listening on port 20000. As there should not be a Mongo server running and listening to this port for client connections, all our write operations to this server should not succeed, and this will now give us a chance to see what happens when we use the write concerns {w:-1} and {w:0} and write to this nonexistent server.

The next two lines in the output show that when we have the write concern {w:-1}, we do get a write result back, but it contains the error flag set to indicate a network error. However, no exception is thrown. In the case of the write concern {w:0}, we do get an exception in the client application for any network errors. Of course, all other write concerns ensuring a strict guarantee will throw an exception in this case too.

Now we come to the portion of the code that connects to the replica set where one of the nodes is listening to port 27000 (if not, the code will show the error on the console and terminate). Now, we attempt to insert a document with a duplicate _id field ({'_id':'a'}) into a collection, once with the write concern {w:0} and once with {w:1}. As we see in the console, the former ({w:0}) didn't throw an exception and the insert went through successfully from the client's perspective, whereas the latter ({w:1}) threw an exception to the client, indicating a duplicate key. The exception contains a lot of information about the server's hostname and port, at the time when the exception occurred: the field for which the unique constraint failed; the client connection ID; error code; and the value that was not unique and caused the exception. The fact is that, even when the insert was performed using {w:0} as the write concern, it failed. However, as the driver didn't wait for the server's acknowledgement, it was never communicated about the failure.

Moving on, we now try to compute the time taken for the write operation to complete. The time shown here is the average of the time taken to execute the same operation with a given write concern five times. Note that these times will vary on different instances of execution of the program, and this method is just meant to give some rough estimates for our study. We can conclude from the output that the time taken for the write concern {w:1} is less than that of {w:2} (asking for an acknowledgement from one secondary node) and the time taken for {w:2} is less than {j:true}, which in turn is less than {fsync:true}. The next line of the output shows us that the average time taken for the write operation to complete is roughly 5 seconds when the write concern is {w:3}. Any guesses on why that is the case? Why does it take so long? The reason is, when w is 3, we send an acknowledgement to the client only when two secondary nodes acknowledge the write operation. In our case, one of the nodes is delayed from the primary by about 5 seconds, and thus, it can acknowledge the write only after 5 seconds, and hence, the client receives a response from the server in roughly 5 seconds.

Let us do a quick exercise here. What do you'll think would be the approximate response time when we have the write concern as {w:'majority'}? The hint here is, for a replica set of three nodes, two is the majority.

Finally we see a timeout exception. Timeout is set using the wtimeout field of the document and is specified in milliseconds. In our case, we gave a timeout of 1000 ms, that is 1 second, and the number of nodes in the replica set to get an acknowledgement from before sending the response back to the client is 5 (four secondary instances). Thus, we have the write concern as {w:5, wtimeout:1000}. As our maximum number of nodes is three, the operation with the value of w set to 5 will wait for a very long time until two more secondary instances are added to the cluster. With the timeout set, the client returns and throws an error to the client, conveying some interesting details. The following is the JSON sent as an exception message:

{ "serverUsed" : "localhost/127.0.0.1:27000" , "n" : 0 , "lastOp" : { "$ts" : 1386015030 , "$inc" : 1} , "connectionId" : 507 , "wtimeout" : true , "waited" : 1000 , "writtenTo" : [ { "_id" : 0 , "host" : "localhost:27000"} , { "_id" : 1 , "host" : "localhost:27001"}] , "err" : "timeout" , "ok" : 1.0}

Let us look at the interesting fields. We start with the n field. This indicates the number of documents updated. As in this case it is an insert and not an update, it stays 0. The wtimeout and waited fields tell us whether the transaction did timeout and the amount of time for which the client waited for a response; in this case 1000 ms. The most interesting field is writtenTo. In this case, the insert was successful on these two nodes of the replica set when the operation timed out, and hence, it is seen in the array. The third node has a slaveDelay value of 5 seconds and, hence, the data is still not written to it. This proves that the timeout doesn't roll back the insert and it does go through successfully. In fact, the node with slaveDelay will also have the data after 5 seconds, even if the operation times out, and this makes perfect sense as it keeps the primary and secondary instances in sync. It is the responsibility of the application to detect such timeouts and handle them.