MongoDB falls into the group of document-oriented NoSQL databases. It is developed and maintained by 10gen (http://www.10gen.com). It is an open source database, written in the programming language C. The source code is licensed under AGPL and freely available at GitHub, anyone can download it from the repo https://github.com/mongodb/mongo and customize it to suit his/her needs. It is increasingly being used as a data storage layer in different kinds of applications, both web-based and nonweb-based.
Features that make learning and using MongoDB a win, include:
Easy to learn, at least easier than learning other NoSQL systems, if I dare say. Column-oriented or graph-based databases introduce radical ideas that many developers struggle to grasp. However, there is a lot of similarity in the basic concepts of MongoDB and a relational database. Developers coming from an RDBMS background, find little trouble adapting to MongoDB.
It implements the idea of flexible schema. You don't have to define the structure of the data before you start storing it, which makes it very suitable for storing non-structured data.
It is highly scalable. It comes with great features to help keep performance optimum, while the size and traffic of data grows, with little or no change in the application layer.
It is free, it can be downloaded and used without charge. It has excellent documentation and an active and co-operative online community who participate in mailing lists, forums, and IRC chat rooms.
Let's take a look at some real world use cases of MongoDB:
Craigslist: Craigslist is the world's most popular website for featuring free classified advertisements. It uses MongoDB to archive billions of records. They had been using a MySQL based solution for achieving that. Replacing them with MongoDB has allowed them to add schema changes without delay, and scale much more easily.
Foursquare: Foursquare is a popular location-based social networking application. It stores the geographical location of interesting venues (restaurants, cafes, and so on) and records when users visit these venues. It uses MongoDB for storing venue and user information.
CERN: The renowned particle physics laboratory based in Geneva, uses MongoDB as an aggregation cache for its Large Hadron Collider experiment. The results for expensive aggregation queries, performed on massive amounts of data, are stored in MongoDB for future use.
A MongoDB server hosts a number of databases. The databases act as containers of data and they are independent of each other. A MongoDB database contains one or more collections. For example, a database for a blogging application named myblogsite may typically have the collections articles, authors, comments, categories, and so on.
A collection is a set of documents. It is logically analogous to the concept of a table in a relational database. But unlike tables, you don't have to define the structure of the data that is going to be stored in the collection beforehand.
A document stored in a collection is a unit of data. A document contains a set of fields or key-value pairs. The keys are strings, the values can be of various types: strings, integers, floats, timestamps, and so on. You can even store a document as the value of a field in another document.
Let's take a closer look at a MongoDB document. The following is an example of a document that stores certain information about a user in a web application:
{ _id : ObjectId("4db31fa0ba3aba54146d851a") username : "joegunchy" email : "[email protected]" age : 26 is_admin : true created : "Sun Apr 24 2011 01:52:58 GMT+0700 (BDST)" }
The previous document has six fields. If you have some JavaScript experience, you would recognize the structure as JSON or JavaScript Object Notation. The value for the first field, _id
, is autogenerated. MongoDB automatically generates an ObjectId
for each document you create in a collection and assigns it as _id
for that document. This is also unique; that means no two documents in the same collection will have the same values for ID, just like a primary key of a table in a relational database. The next two fields, username
and email
are strings, age
is an integer, and is_admin
is boolean. Finally, created
is a JavaScript DateTime
object, represented as a string.
We have already seen that the structure of a document imitates a JSON object. When you store this document in the database, it is serialized into a special binary encoded format, known as BSON, short for binary JSON. BSON is the default data exchange format for MongoDB. The key advantage of BSON is that it is more efficient than conventional formats such as XML and JSON, both in terms of memory consumption and processing time. Also, BSON supports all the data types supported by JSON (string, integer, double, Boolean, array, object, null) plus some special data types such as regular expression, object ID, date, binary data, and code. Programming languages such as PHP, Python, Java, and so on have libraries that manage conversion of language-specific data structures (for example, the associative array in PHP) to and from BSON. This enables the languages to easily communicate with MongoDB and manipulate the data in it.
Note
If you are interested to learn more about BSON format, you may try visiting http://bsonspec.org/.
Developers with a background on working with relational database systems will quickly recognize the similarities between the logical abstractions of the relational data model and the Mongo data model. The next figure compares components of a relational data model with those of the Mongo data model:
The next figure shows how a single row of a hypothetical table named users
is mapped into a document in a collection:
Also just like columns of a RDBMS table, fields of a collection can be indexed, although implementations of indexing are different.
So much for the similarities: now let's talk briefly about the differences. The key thing that distinguishes MongoDB from a relational model is the absence of relationship constraints. There are no foreign keys in a collection and as a result there are no JOIN
queries. Constraint management is typically handled in the application layer. Also, because of its flexible schema property, there is no expensive ALTER TABLE
statement in MongoDB.