Book Image

Learning Apache Cassandra

By : Matthew Brown
4 (1)
Book Image

Learning Apache Cassandra

4 (1)
By: Matthew Brown

Overview of this book

Table of Contents (19 chapters)
Learning Apache Cassandra
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The structure of a simple primary key table


To start with, let's have a look at the users table. To do this, we'll start with the LIST command that prints all the data in a given column family:

LIST users;

This will print out a long list of information, grouped by RowKey. For brevity, the first couple of RowKey groups appear as follows:

Although we've never seen it structured like this before, the data here should look pretty familiar. The RowKey headers correspond to the username column in our CQL3 table structure. Within each RowKey is a collection of tuples, each tuple containing a name, a value, and a timestamp. We will call these tuple cells, in keeping with the terminology used in the cassandra-cli interface itself.

Note

You might encounter the word column being used for the name-value-timestamp tuples we are exploring here. Not only does that terminology invite confusion with the concept of a column in CQL3, but it's also a singularly misleading way to describe the data structure in question. We'll stick with "cell", which is both unambiguous and more descriptive. Please excuse us for referring to collections of cells as column families—there is no better alternative.

Exploring cells

Looking at the name attribute, we see things like email, encrypted_password, version, and location. Clearly, the name attribute of the cells corresponds to the names of columns in our CQL schema—although the relationship is more complex than it might appear, as we'll explore in the next section.

The value field in the cells is a bit of a mystery; given that the name contains column names, we might expect that the value would contain column values. However, what we see in cassandra-cli are just some inscrutable hexadecimal blobs.

As it turns out, under the hood Cassandra represents all data as hexadecimal byte arrays; the type system is part of CQL's abstraction layer. The cassandra-cli utility does give us a way to retrieve human-readable values of individual columns, using the AS keyword to explicitly specify the type. Let's try to read the value of the cell with the name value as email, from HappyCorp's user record:

GET users['happycorp']['email'] AS ascii;

The GET command allows us to access a single cell by first specifying the RowKey value, then the cell name. We can also omit the cell name to return all the cells at a given RowKey.

In this case, thanks to our use of AS, we can see a human-readable value for HappyCorp's email address:

Thankfully, the CLI will remember our preference for reading emails in ASCII, and will accordingly print out all cells named email from the users column family for the remainder of the session.

A model of column families: RowKey and cells

At this point, we've satisfied ourselves that each cell in a column family corresponds to a column name and value at the CQL level. The mapping between CQL3 tables and lower-level column families, so far, seems pretty straightforward. A table row's primary key value is stored in the RowKey field, and each cell contains a name-value pair that represents the value of the named column in the row.

We can make the relationship between the table representation and the column family representation more accessible with a visualization of each:

We thus visualize the column family as a collection of RowKey values, with each RowKey linked to a collection of cells containing a name and a value. Note that not every CQL3 column is represented by a cell in every RowKey field; only when a given column has a value does it have a corresponding cell in the column family. This is consistent with our mental model of CQL3 tables, developed in the Developing a mental model for Cassandra section of Chapter 2, The First Table.

As it turns out, the similarities between column families and CQL3 tables take us this far and no further.