Book Image

Mastering Apache Cassandra

By : Nishant Neeraj
Book Image

Mastering Apache Cassandra

By: Nishant Neeraj

Overview of this book

<p>Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data across multiple datacenters with lower latency rates. This book details these features that will guide you towards mastering the art of building high performing databases without compromising on performance.</p> <p>Mastering Apache Cassandra aims to give enough knowledge to enable you to program pragmatically and help you understand the limitations of Cassandra. You will also learn how to deploy a production setup and monitor it, understand what happens under the hood, and how to optimize and integrate it with other software.</p> <p>Mastering Apache Cassandra begins with a discussion on understanding Cassandra’s philosophy and design decisions while helping you understand how you can implement it to resolve business issues and run complex applications simultaneously.</p> <p>You will also get to know about how various components of Cassandra work with each other to give a robust distributed system. The different mechanisms that it provides to solve old problems in new ways are not as twisted as they seem; Cassandra is all about simplicity. Learn how to set up a cluster that can face a tornado of data reads and writes without wincing.</p> <p>If you are a beginner, you can use the examples to help you play around with Cassandra and test the water. If you are at an intermediate level, you may prefer to use this guide to help you dive into the architecture. To a DevOp, this book will help you manage and optimize your infrastructure. To a CTO, this book will help you unleash the power of Cassandra and discover the resources that it requires.</p>
Table of Contents (17 chapters)
Mastering Apache Cassandra
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction to Cassandra


Quoting from Wikipedia:

Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.

It may be too complicated to digest as a one-liner. Let's break it into pieces and see what it means.

Distributed database

In computing, distributed means something spread across more than one machine—it may be data or processes. In the context of Cassandra, it means that the data is distributed across multiple machines. So, why does it matter? This relates to many things: it means that no single node (a machine in a cluster is usually called a node) holds all the data, but just a chunk of it. It means that you are not limited by the storage and processing capabilities of a single machine. If data gets larger, add more machines. Need more parallelism, add more machines. This means that a node going down does not mean that all the data is lost (we will cover this issue soon).

If a distributed mechanism is well designed, it will scale with a number of nodes. Cassandra is one of the best examples of such a system. It scales almost linearly with regard to performance, when we add new nodes. This means Cassandra can handle the behemoth of data without wincing.

Note

Check out an excellent paper on the NoSQL database comparison titled Solving Big Data Challenges for Enterprise Application Performance Management at http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf.

High availability

We will discuss availability in the next chapter. For now, assume availability is the probability that we query and the system just works. A high availability system is the one that is ready to serve any request at any time. High availability is usually achieved by adding redundancies. So, if one part fails, the other part of the system can serve the request. To a client, it seems as if everything worked fine.

Cassandra is a robust software. Nodes joining and leaving are automatically taken care of. With proper settings, Cassandra can be made failure resistant. That means that if some of the servers fail, the data loss will be zero. So, you can just deploy Cassandra over cheap commodity hardware or a cloud environment, where hardware or infrastructure failures may occur.

Replication

Continuing from the last two points, Cassandra has a pretty powerful replication mechanism (we will see more details in the next chapter). Cassandra treats every node in the same manner. Data need not be written on a specific server (master) and you need not wait until the data is written to all the nodes that replicate this data (slaves).

So, there is no master or slave in Cassandra and replication happens asynchronously. This means that the client can be returned with success as a response as soon as the data is written on at least one server. We will see how we can tweak these settings to ensure the number of servers we want to have data written on before the client returns.

We can derive a couple of things from this: when there is no master or slave, we can write to any node for any operation. Since we have the ability to choose how many nodes to read from or write to, we can tweak it to achieve very low latency (read or write from one server).

Multiple data centers

Expanding from a single machine to a single data center cluster or multiple data center is very trivial. We will see later in this book that we can use this data center setting to make a real-time replicating system across data centers. We can use each data center to perform different tasks without overloading the other data centers. This is a powerful support when you do not have to worry about whether the users in Japan with a data center in Tokyo and the users in the US with a data center in Virginia are in sync or not.

These are just broad strokes of Cassandra's capabilities. We will explore more in the upcoming chapters. This chapter is about getting excited.