Book Image

Seven NoSQL Databases in a Week

By : Sudarshan Kadambi, Xun (Brian) Wu
Book Image

Seven NoSQL Databases in a Week

By: Sudarshan Kadambi, Xun (Brian) Wu

Overview of this book

This is the golden age of open source NoSQL databases. With enterprises having to work with large amounts of unstructured data and moving away from expensive monolithic architecture, the adoption of NoSQL databases is rapidly increasing. Being familiar with the popular NoSQL databases and knowing how to use them is a must for budding DBAs and developers. This book introduces you to the different types of NoSQL databases and gets you started with seven of the most popular NoSQL databases used by enterprises today. We start off with a brief overview of what NoSQL databases are, followed by an explanation of why and when to use them. The book then covers the seven most popular databases in each of these categories: MongoDB, Amazon DynamoDB, Redis, HBase, Cassandra, In?uxDB, and Neo4j. The book doesn't go into too much detail about each database but teaches you enough to get started with them. By the end of this book, you will have a thorough understanding of the different NoSQL databases and their functionalities, empowering you to select and use the right database according to your needs.
Table of Contents (16 chapters)
Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Preface
Index

Reads and writes


Let's look at the internal mechanics of how reads and writes are executed within a RegionServer instance.

The HBase write path

HDFS is an append-only file system, so how could a database that supports random record updates be built on top of it?

HBase is what's called a log-structured merge tree, or an LSM, database. In an LSM database, data is stored within a multilevel storage hierarchy, with movement of data between levels happening in batches. Cassandra is another example of an LSM database.

When a write for a key is issued from the HBase client, the client looks up Zookeeper to get the location of the RegionServer that hosts the META region. It then queries the META region to find out a table's regions, their key ranges, and the RegionServers they are hosted on.

The client then makes an RPC call to the RegionServer that contains the key in the write request. The RegionServer receives the data for the key, immediately persists this in an in-memory structure called the Memstore...