Book Image

Mastering Apache Cassandra

By : Nishant Neeraj
Book Image

Mastering Apache Cassandra

By: Nishant Neeraj

Overview of this book

<p>Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data across multiple datacenters with lower latency rates. This book details these features that will guide you towards mastering the art of building high performing databases without compromising on performance.</p> <p>Mastering Apache Cassandra aims to give enough knowledge to enable you to program pragmatically and help you understand the limitations of Cassandra. You will also learn how to deploy a production setup and monitor it, understand what happens under the hood, and how to optimize and integrate it with other software.</p> <p>Mastering Apache Cassandra begins with a discussion on understanding Cassandra’s philosophy and design decisions while helping you understand how you can implement it to resolve business issues and run complex applications simultaneously.</p> <p>You will also get to know about how various components of Cassandra work with each other to give a robust distributed system. The different mechanisms that it provides to solve old problems in new ways are not as twisted as they seem; Cassandra is all about simplicity. Learn how to set up a cluster that can face a tornado of data reads and writes without wincing.</p> <p>If you are a beginner, you can use the examples to help you play around with Cassandra and test the water. If you are at an intermediate level, you may prefer to use this guide to help you dive into the architecture. To a DevOp, this book will help you manage and optimize your infrastructure. To a CTO, this book will help you unleash the power of Cassandra and discover the resources that it requires.</p>
Table of Contents (17 chapters)
Mastering Apache Cassandra
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

CRUD with cassandra-cli


Cassandra is up and running. Let's test the waters. Just do a complete CRUD (create, retrieve, update, and delete) operation in cassandra-cli. The following snippet shows the complete operation. cassandra-cli can be accessed from $CASSANDRA_HOME/bin/cassandra-cli. It is the Cassandra command-line interface. You can learn more about it in the Appendix.

# Log into cassandra-cli
$ /home/nishant/apps/apache-cassandra-1.1.11/bin/cassandra-cli -h localhost
Connected to: "nishant_sandbox" on localhost/9160
Welcome to Cassandra CLI version 1.1.11

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

Create a keyspace named crud. Note that we are not using a lot of the options that we may set to a Keyspace during its creation. We are just using the defaults. We will learn about those options in the Keyspaces section in Chapter 3, Design Patterns.

[default@unknown] CREATE KEYSPACE crud;
e9f103f5-9fb8-38c9-aac8-8e6e58f91148
Waiting for schema agreement...
... schemas agree across the cluster

Create a column family test_cf. Again, we are using just the default settings. The advanced settings will come later in this book. The ellipses in the preceding command are not a part of the command. It gets added by cassandra-cli as a notation of continuation from the previous line. Here, DEFAULT_VALIDATION_CLASS is the default type of value you are going to store in the columns, KEY_VALIDATION_CLASS is the type of row key (the primary key), and COMPARATOR is the type of column name. Now, you must be thinking why we call it comparator and not something like COLUMN_NAME_VALIDATION_CLASS like other attributes. The reason is column names perform an important task—sorting. Columns are validated and sorted by the class that we mention as comparator. We will see this property in a couple of paragraphs. The important thing is that you can write your own comparator and create data to be stored and fetched in custom order. We will see how to create a custom comparator in the Writing a custion comparator section in Chapter 3, Design Patterns.

[default@unknown] USE crud;
Authenticated to keyspace: crud
[default@crud] CREATE COLUMN FAMILY test_cf
...	WITH                                   
...	DEFAULT_VALIDATION_CLASS = UTF8Type AND
...	KEY_VALIDATION_CLASS = LongType AND    
...	COMPARATOR = UTF8Type;                 
256297f8-1d96-3ba9-9061-7964684c932a
Waiting for schema agreement...
... schemas agree across the cluster

It is fairly easy to insert the data. The pattern is COLUMN_FAMILY[ROW_KEY][COLUMN_NAME] = COLUMN_VALUE.

[default@crud] SET test_cf[1]['first_column_name'] = 'first value';
Value inserted.
Elapsed time: 71 msec(s).
[default@crud] SET test_cf[1]['2nd_column_name'] = 'some text value';
Value inserted.
Elapsed time: 2.59 msec(s).

Retrieval is as easy, with a couple of ways to get data. To retrieve all the columns in a row, perform GET COLUMN_FAMILY_NAME[ROW_KEY]; to get a particular column, do GET COLUMN_FAMILY_NAME[ROW_KEY][COLUMN_NAME]. To get N rows, perform LIST with the LIMIT operation using the following pattern:

[default@crud] GET test_cf[1];        
=> (column=2nd_column_name, value=some text value, timestamp=1376234991712000)
=> (column=first_column_name, value=first value, timestamp=1376234969488000)
Returned 2 results.
Elapsed time: 92 msec(s).

Did you notice how columns are printed in an alphabetical order and not in the order of the insertion?

Deleting a row or column is just specifying the column or the row to the DEL command:

# Delete a column
[default@crud] DEL test_cf[1]['2nd_column_name'];
column removed.

# column is deleted
[default@crud] GET test_cf[1];                   
=> (column=first_column_name, value=first value, timestamp=1376234969488000)
Returned 1 results.
Elapsed time: 3.38 msec(s).

Updating a column in a row is nothing but inserting the new value in that column. Insert in Cassandra is like upsert that some RDBMS vendors offer:

[default@crud] SET test_cf[1]['first_column_name'] = 'insert is basically upsert :)';
Value inserted.
Elapsed time: 2.44 msec(s).

# the column is updated.
[default@crud] GET test_cf[1];               
=> (column=first_column_name, value=insert is basically upsert :), timestamp=1376235103158000)
Returned 1 results.
Elapsed time: 3.31 msec(s).

To view a schema, you may use the SHOW SCHEMA command. It shows the details of the specified schema. In fact, it prints the command to create the keyspace and all the column families in it with all available options. Since we did not set any option, we see all the default values for the options:

[default@crud] SHOW SCHEMA crud;
create keyspace crud
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {datacenter1 : 1}
  and durable_writes = true;

use crud;

create column family test_cf
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'LongType'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};

Another thing that one might want to do, which is pretty common when learning Cassandra, is the ability to wipe all the data in a column family. TRUNCATE is the command to do that for us:

# clean test_cf
[default@crud] TRUNCATE test_cf;
test_cf truncated.

# list all the data in test_cf
[default@crud] LIST test_cf;
Using default limit of 100
Using default column limit of 100

0 Row Returned.
Elapsed time: 41 msec(s).

Dropping column family or keyspace is as easy as mentioning the entity type and name after the DROP command. Here is a demonstration:

# Drop test_cf
[default@crud] drop column family  test_cf;
29d44ab2-e4ab-3e22-a8ab-19de0c40aaa5
Waiting for schema agreement...
... schemas agree across the cluster

# No more test_cf in the schema
[default@crud] show schema crud;           
create keyspace crud
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {datacenter1 : 1}
  and durable_writes = true;

use crud;

# Drop keyspace
[default@crud] drop keyspace crud;         
45583a34-0cde-3d7d-a754-b7536d7dd3af
Waiting for schema agreement...
... schemas agree across the cluster

# No such schema
[default@unknown] show schema crud;  
Keyspace 'crud' not found.

# Exit from cassandra-cli
[default@unknown] exit;

Note

Notice that all the commands must end with a semicolon.