Book Image

Apache Cassandra Essentials

By : Nitin Padalia
Book Image

Apache Cassandra Essentials

By: Nitin Padalia

Overview of this book

Apache Cassandra Essentials takes you step-by-step from from the basics of installation to advanced installation options and database design techniques. It gives you all the information you need to effectively design a well distributed and high performance database. You’ll get to know about the steps that are performed by a Cassandra node when you execute a read/write query, which is essential to properly maintain of a Cassandra cluster and to debug any issues. Next, you’ll discover how to integrate a Cassandra driver in your applications and perform read/write operations. Finally, you’ll learn about the various tools provided by Cassandra for serviceability aspects such as logging, metrics, backup, and recovery.
Table of Contents (14 chapters)
Apache Cassandra Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

Traditional database management systems sometimes become the bottleneck of being highly available, scalable, and ultra responsive for modern day applications, as they are not able to satisfy the storage and retrieval needs of modern applications with all these attributes. Apache Cassandra being a highly available, massively scalable, NoSQL, query-driven database helps our applications to achieve these modern day must have attributes. Apache Cassandra's core features include handling of large data with the flexibility of configuring responsiveness, scalability, and high availability at the same time to suit our requirements.

In this book, I've provided step-by-step information starting from the basic installation to the advanced installation options and database design techniques. It gives all the information that you will need to design a well-distributed and high performance database. This book focuses on explaining core concepts with simple and easy-to-understand examples. I've also incorporated some code examples with this book. You can use these examples while working on your day-to-day tasks with Cassandra.

What this book covers

Chapter 1, Getting Your Cassandra Cluster Ready, gives an introduction to Cassandra and helps you to set up your cluster. It also introduces you to the various configuration options available to set up your cluster, which can be referred to while fine tuning the cluster.

Chapter 2, An Architectural Overview, helps you to understand the internal architecture of a Cassandra cluster. It details various strategies used by Cassandra to distribute data among various nodes in the cluster. It describes how Cassandra becomes highly available by employing various replication strategies. It also clarifies various replication and data distribution strategies.

Chapter 3, Creating Database and Schema, details the concepts used by Cassandra. We'll learn to use CQL (Cassandra Query Language), which is used by Cassandra clients to describe data models, to create our databases and tables. Also, we'll discuss various techniques provided by Cassandra that can be used based on our storage and data retrieval requirements.

Chapter 4, Read and Write – Behind the Scenes, has been written keeping in mind how the reader can understand core concepts of a system. We'll discuss the operations that Cassandra performs for every read and write query along with all the data structures and caches it uses. We'll also discuss what configuration options it provides to configure the trade-off between consistency and latency. In the later parts of this chapter, we'll see how we can trace a Cassandra read/write query to debug performance issues for our read/write queries.

Chapter 5, Writing Your Cassandra Client, provides some code samples to set up your cluster, learn the core concepts of Cassandra, and create your database and schema. Now comes the time to know how our application will connect to the Cassandra cluster and perform a read/write operation.

Chapter 6, Monitoring and Tuning a Cassandra Cluster, covers various tools that can be used to monitor your Cassandra cluster. After you set up your application and cluster, it is necessary to know how to monitor your Cassandra cluster in order to run it successfully consistently. We'll also discuss various tuning parameters that are used to fine-tune Cassandra with regards to our hardware or networking environments.

Chapter 7, Backup and Restore, talks about Cassandra being highly available with no single point of failure. Sometimes there could be a scenario when we would need to restore data from an old snapshot; for example; suppose some buggy client corrupted our data and we want to recover from last day's snapshot. For situations like this, Cassandra has an option to take a backup of data and use various restore techniques. You'll learn about these techniques in this chapter.

What you need for this book

In this book, we'll set up a Cassandra cluster. Cassandra server's latest code can be downloaded from http://cassandra.apache.org/download/. We refer to the Cassandra Server version more than or equal to 2.x in our examples; this version requires Java version more than or equal to 1.7 and Python version more than or equal to 2.6. Python is required to run the CQL client cqlsh provided by Cassandra. In later chapters, we use the Datastax Java driver as the Cassandra client; for example, the Cassandra Java driver by Datastax can be downloaded from https://github.com/datastax/java-driver. We will use the driver version 2.1.2 in our examples. Other than that, if you set up a cluster for your development environment, then your development machine should have at least 4 GB of RAM and at least a dual core CPU. While working with a Java client, we expect you to have a basic knowledge of Java. While working on a Cassandra client, use any IDE; for example, Eclipse (https://eclipse.org/), for building. I've provided dependencies according to the Maven (https://maven.apache.org/) and Gradle (https://gradle.org/) frameworks.

Who this book is for

This book is written keeping in mind developers at both beginner and intermediate level. It also includes topics on maintenance and fine tuning Cassandra also debugging your queries so that you can get the best out of it. This book is useful for all those who are working with huge datasets and since traditional relational databases are not able to satisfy their needs of high performance, availability and scalability, so they want to learn Cassandra. However, it's not required for them to be aware of traditional relational concepts. In fact, not knowing relational model at all might help in some cases because when you are designing your database, you won't be thinking about it from the traditional relational database perspective.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Apache provides source as well as binary tarballs and Debian packages."

A block of code is set as follows:

$ sudomkdir -p /var/log/Cassandra
$ sudochown -R `whoami` /var/log/Cassandra
$ sudomkdir -p /var/lib/Cassandra
$ sudochown -R `whoami` /var/lib/cassandra

Any command-line input or output is written as follows:

$ java –version
java version "1.7.0_45"

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "OrderPreservingPartitioner is similar to above with same challenges and additional limitation that it assumes that keys are UTF8 strings".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.