Book Image

MongoDB Administrator???s Guide

By : Cyrus Dasadia
Book Image

MongoDB Administrator???s Guide

By: Cyrus Dasadia

Overview of this book

MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of the systems that power many different organizations. Packed with many features that have become essential for many different types of software professional and incredibly easy to use, this cookbook contains more than 100 recipes to address the everyday challenges of working with MongoDB. Starting with database configuration, you will understand the indexing aspects of MongoDB. The book also includes practical recipes on how you can optimize your database query performance, perform diagnostics, and query debugging. You will also learn how to implement the core administration tasks required for high-availability and scalability, achieved through replica sets and sharding, respectively. You will also implement server security concepts such as authentication, user management, role-based access models, and TLS configuration. You will also learn how to back up and recover your database efficiently and monitor server performance. By the end of this book, you will have all the information you need—along with tips, tricks, and best practices—to implement a high-performance MongoDB solution.
Table of Contents (17 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Choosing the right MongoDB storage engine


Starting with MongoDB Version 3.0, a new storage engine named WiredTiger was available and very soon it became the default storage engine in version 3.2. Up until then, MMAPv1 was used as the default storage engine. I will give you a brief rundown on the main features of both storage engines and hopefully it should give you enough to decide which one suits your application's requirements.

WiredTiger

WiredTiger provides the ability, for multiple clients, to perform write operations on the same collection. This is achieved by providing document-level concurrency such that during a given write operation, the database only locks a given document in the collection as against its predecessors, which would lock the entire collection. This drastically improves performance for write heavy applications. Additionally, WiredTiger provides compression of data for indexes and collections. The current compression algorithms used by WiredTiger are Google's Snappy and zLib. Although disabling compression is possible, one should not immediately jump this gun unless it is truly load-tested while planning your storage strategy.

WiredTiger uses Multi-Version Concurrency Control (MVCC) that allows asserting point-in-time snapshots of transactions. These finalized snapshots are written to disk which helps create checkpoints in the database. These checkpoints eventually help determine the last good state of data files and helps in recovery of data during abnormal shutdowns. Additionally, journaling is also supported with WiredTiger where write-ahead transaction logs are maintained. The combination of journaling and checkpoints increases the chance of data recovery during failures. WiredTiger uses internal caching as well as filesystem cache to provide faster responses on queries. With high concurrency in mind, the architecture of WiredTiger is such that it better utilizes multi-core systems.

MMAPv1

MMAPv1 is quite mature and has proven to be quite stable over the years. One of the storage allocation strategies used with this engine is the power of two allocation strategy. This primarily involves storing double the amount of document space (in power of twos) such that in-place updates of documents become highly likely without having to move the documents during updates. Another storage strategy used with this engine is fixed sizing. In this, the documents are padded (for example, with zeros) such that maximum data allocation for each document is attained. This strategy is usually followed by applications that have fewer updates.

Consistency in MMAPv1 is achieved by journaling, where writes are written to a private view in memory which are written to the on-disk journal. Upon which the changes are then written to a shared view that is the data files. There is no support for data compression with MMAPv1. Lastly, MMAPv1 heavily relies on page caches and hence uses up available memory to retain the working dataset in cache thus providing good performance. Although, MongoDB does yield (free up) memory, used for cache, if another process demands it. Some production deployments avoid enabling swap space to ensure these caches are not written to disk which may deteriorate performance.

The verdict

So which storage engine should you choose? Well, with the above mentioned points, I personally feel that you should go with WiredTiger as the document level concurrency itself is a good marker for attaining better performance. However, as all engineering decisions go, one should definitely not shy away from performing appropriate load testing of the application across both storage engines.

Note

The enterprise MongoDB version also provides in-memory storage engine and supports encryption at rest. These are good features to have depending on your application's requirements.