Book Image

Mastering MongoDB 7.0 - Fourth Edition

By : Marko Aleksendrić, Arek Borucki, Leandro Domingues, Malak Abu Hammad, Elie Hannouch, Rajesh Nair, Rachelle Palmer
Book Image

Mastering MongoDB 7.0 - Fourth Edition

By: Marko Aleksendrić, Arek Borucki, Leandro Domingues, Malak Abu Hammad, Elie Hannouch, Rajesh Nair, Rachelle Palmer

Overview of this book

Mastering MongoDB 7.0 explores the latest version of MongoDB, an exceptional NoSQL database solution that aligns with the needs of modern web applications. This book starts with an informative overview of MongoDB’s architecture and developer tools, guiding you through the process of connecting to databases seamlessly. This MongoDB book explores advanced queries in detail, including aggregation pipelines and multi-document ACID transactions. It delves into the capabilities of the MongoDB Atlas developer data platform and the latest features, such as Atlas Vector Search, and their role in AI applications, enabling developers to build applications with the scalability and performance that today’s organizations need. It also covers the creation of resilient search functionality using MongoDB Atlas Search. Mastering MongoDB 7.0’s deep coverage of advanced techniques encompasses everything from role-based access control (RBAC) to user management, auditing practices, and encryption across data, network, and storage layers. By the end of this book, you’ll have developed the skills necessary to create efficient, secure, and high-performing applications using MongoDB. You’ll have the confidence to undertake complex queries, integrate robust applications, and ensure data security to overcome modern data challenges.
Table of Contents (20 chapters)
4
Chapter 4: Connecting to MongoDB

Efficiency of the inherent complexity of MongoDB databases

The most interesting part of the modern database is understanding its architecture and why it's built that way. Fundamentally, MongoDB is a distributed system. The database server itself was originally built with the anticipation that most users would run it with a default configuration—replica set, sometimes also referred to as a cluster. When you explore this architecture in-depth, you'll notice the true complexities.

By default, replica set of MongoDB is a three-node configuration. All three nodes are data-bearing, which means that there is a complete copy of the database available on each node. Each database is hosted on a separate instance or host, which can be in the same availability zone, data center, or region. This default configuration is to ensure both redundancy and high availability. Chapter 2, The MongoDB Architecture will discuss replica sets in more detail.

If one of the instances becomes unresponsive or unavailable, a healthy node is promoted to become the primary node. This failover between members occurs automatically, and there's no impact on operations for the users of the database. This process considers many different factors, including node availability, data freshness, and responsiveness. This election process and protocol, while simple to understand at a high-level, is very nuanced. But since the operations continue without interruption, you hardly know or understand these details.

How is this possible?

Behind the scenes, write operations to MongoDB are propagated from the primary node to the secondary nodes via a process called replication. The best way to explain replication is with the example of a single write to the database. An inbound write from the client application (your app) will be first directed to the primary node. That primary node will apply the write to its copy of the database. Then, the write is recorded in the operations log (oplog), which is tailed by secondary nodes.

Replication in MongoDB is based on the RAFT consensus protocol. One particular example of how this implementation varies is leader elections. In the traditional RAFT protocol, leader and primary node election occurs through a combination of randomized election timeouts and message exchanges. In MongoDB, there are settings for node priority. This priority is considered along with data freshness and response time when electing a primary node.

It is often true that the write operation is not written simultaneously to all nodes—there is a lag heavily influenced by factors such as network latency, the distance between nodes, hardware configuration, and workload. If one of the mongod nodes falls behind, it will catch up or resync itself when it is able to do so using the oplog to determine the gaps in its operations. The MongoDB system monitors the replication lag between nodes to track this metric and assess whether the delay between primary and secondary nodes is acceptable, and if not, takes necessary action. This process is unique among databases as well.

This default configuration of MongoDB is a replica set with three members, where replication of data between nodes and failover between nodes are all handled automatically. This configuration is both durable and highly available, which makes it easy to use. For developers who require larger, global deployments, MongoDB has a sharded cluster model. The first thing to understand is that a sharded cluster consists of replica sets. It is a way of further dividing your data into effectively replicated partitions.

Figure 1.1: Replicated partitions set with primary and secondary nodes

If you require a global deployment with multiple terabytes of data, get started with Chapter 2, The MongoDB Architecture. It will cover how to split data, how to migrate data between regions or shards, how to marry data from multiple regions for analytics, and the performance of sharded cluster architectures.