Book Image

MongoDB Data Modeling

By : Wilson da Rocha França
Book Image

MongoDB Data Modeling

By: Wilson da Rocha França

Overview of this book

<p>This book covers the basic concepts in data modeling and also provides you with the tools to design better schemas. With a focus on data usage, this book will cover how queries and indexes can influence the way we design schemas, with thorough examples and detailed code.</p> <p>The book begins with a brief discussion of data models, drawing a parallel between relational databases, NoSQL, and consequently MongoDB. Next, the book explains the most basic MongoDB concepts, such as read and write operations, indexing, and how to design schemas by knowing how applications will use the data. Finally, we will talk about best practices that will help you optimize and manage your database, presenting you with a real-life example of data modeling on a real-time logging analytics application.</p>
Table of Contents (16 chapters)

Introducing NoSQL (Not Only SQL)


Although the concept is new, NoSQL is a highly controversial subject. If you search widely, you may find many different explanations. As we do not have any intention of creating a new one, let's take a look at the most commonly-used explanation.

The term NoSQL, as we know today, was introduced by Eric Evans, after a meet up, organized by Johan Oskarsson from Last.fm.

Indeed, Oskarsson and everyone else who joined that historical meeting in San Francisco, on June 11, 2009, were already discussing many of the databases that today we call NoSQL databases, such as Cassandra, HBase, and CouchDB. As Oskarsson had described, the meeting was about open source, distributed, non-relational databases, for anyone who had "… run into limitations with traditional relational databases…," with the aim of "… figuring out why these newfangled Dynamo clones and BigTables have become so popular lately."

Four months later, Evans wrote in his weblog that, besides the growth of the NoSQL movement and everything that was being discussed, he thought they were going nowhere. However, Emil Eifren, the Neo4J founder and CEO, was right in naming the term as "Not Only SQL."

Emil Eifrem post on Twitter introducing the term "Not Only SQL"

More important than giving a definition to the term NoSQL, all these events were a starting point from which to discuss what NoSQL really is. Nowadays, there seems to be a general understanding that NoSQL was born as a response to every subject that relational databases were not designed to address.

Notably, we can now distinguish the problems that information systems must solve from the 70's up until today. At that time, monolithic architectures were enough to supply demand, unlike what we observe nowadays.

Have you ever stopped to think how many websites, such as social networks, e-mail providers, streaming services, and online games, you already have an account with? And, how many devices inside your house are connected to the Internet right now?

Do not worry if you cannot answer the preceding questions precisely. You are not alone. With each new research project, the number of users with Internet access around the globe increases, and the share that represents mobile internet access is more significant too.

This means that a large volume of unstructured or semi-structured data is generated every second, everywhere. The amount of data cannot be estimated, since the user is the main source of information. Thus, it is getting more and more difficult to predict when or why this volume will vary. It's just a matter of an unpredictable event happening somewhere in the world—such as a goal score, a general strike, a mass demonstration, or a plane crash—to have a variation on traffic, and consequently a growth of content generated by users.

In response to this, the development of NoSQL technology brought a variety of different approaches.

NoSQL databases types

As previously stated, Amazon Inc. and Google are at the forefront of NoSQL development with the help of Amazon DynamoDB and Google BigTable. Because of the diversity of styles, we have new types of NoSQL databases that are developed all the time. However, four basic types, based on data model, are known: key-value stores, wide-column stores, document databases, and graph databases, which are explained as follows:

  • Key-value stores: The key-value is one of the most simple and straightforward data models, where each record is stored as a key together with its value. Examples of key-value stores are Amazon Dynamo, Riak, and Redis.

    Tip

    Redis can be described as an advanced key-value cache and store. Since its keys can store many different data types and run atomic operations on these types, we may assume Redis to be a data structure server.

  • Wide-column stores: Conceptually, the closest to relational databases, since its data is represented in a table. Nevertheless, the database stores columns of data instead of rows. Examples of wide-column stores are Google BigTable, Cassandra, and HBase.

  • Document databases: As its name suggests, the data model of this database has as a main concept, the document. Documents are complex structures that store data as key-values, and can contain many key-value pairs, key-array pairs, or even nested documents. Examples of document databases are MongoDB, Apache CouchDB, and Amazon SimpleDB.

  • Graph databases: Graph databases are the best way to store items of data whose relationships are best represented as graphs, such as network topologies and social networks. Nodes, edges, and properties are the structure of stored data. Examples of graph databases are Neo4J and HyperGraphDB.

Dynamic schema, scalability, and redundancy

Although, as explained earlier, NoSQL database types are based on different data models, they have some common features.

In order to support unstructured or semi-structured data, NoSQL databases have no predefined schema. The dynamic schema makes real-time changes simpler when inserting new data, and more cost-effective when data migration is needed.

To handle an unpredictable, large volume of data, NoSQL databases use auto-sharding to scale horizontally and ensure continuous availability of data. Auto-sharding allows users to automatically spread data and traffic across a number of servers.

NoSQL databases also support replication natively, which gives you high availability and recovery in a quick and easy way. As we distribute our data more and our recovery strategies change, we may fine-tune our consistency levels.