Book Image

Learning Apache Cassandra - Second Edition

Book Image

Learning Apache Cassandra - Second Edition

Overview of this book

Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer. The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you’ll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you’ll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client. By the end of this book, you'll be fully equipped to build powerful, scalable Cassandra database layers for your applications.
Table of Contents (14 chapters)

The structure of a simple primary key table


To start with, let's have a look at the users table. To do this, we'll start with the LIST command that prints all the data in a given column family:

LIST users;

This will print out a long list of information, grouped by RowKey. For brevity, the first couple of RowKey groups appear as follows:

Although we've never seen it structured like this before, the data here should look pretty familiar. The RowKey headers correspond to the username column in our CQL3 table structure. Within each RowKey is a collection of tuples, each tuple containing a name, a value, and a timestamp. We will call these tuple cells, in keeping with the terminology used in the cassandra-cli interface itself.

Note

You might encounter the word column being used for the name-value-timestamp tuples we are exploring here. Not only does that terminology invite confusion with the concept of a column in CQL3, but it's also a singularly misleading way to describe the data structure in question...