Book Image

Cassandra High Performance Cookbook

By : Edward Capriolo
Book Image

Cassandra High Performance Cookbook

By: Edward Capriolo

Overview of this book

<p>Apache Cassandra is a fault-tolerant, distributed data store which offers linear scalability allowing it to be a storage platform for large high volume websites. <br /><br />This book provides detailed recipes that describe how to use the features of Cassandra and improve its performance. Recipes cover topics ranging from setting up Cassandra for the first time to complex multiple data center installations. The recipe format presents the information in a concise actionable form.<br /><br />The book describes in detail how features of Cassandra can be tuned and what the possible effects of tuning can be. Recipes include how to access data stored in Cassandra and use third party tools to help you out. The book also describes how to monitor and do capacity planning to ensure it is performing at a high level. Towards the end, it takes you through the use of libraries and third party applications with Cassandra and Cassandra integration with Hadoop.</p>
Table of Contents (20 chapters)
Cassandra High Performance Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Setting up a "Shadow" data center for running only MapReduce jobs


MapReduce and other Extract Translate Load (ETL) processing can be intensive, which can interfere with the ability of Cassandra to serve other requests promptly. This recipe shows how to set up a second Cassandra data center for ETL, as depicted in the following image:

Getting ready

Review the chapter on Multi datacenter deployments for recipes on multi-data centers setups.

How to do it...

  1. Create a keyspace that is replicated three times in DC1, but only once in DC2:

    [default@unknown] create keyspace ks33 with 
    placement_strategy = 'org.apache.cassandra.locator.
    NetworkTopologyStrategy' and strategy_options=[{DC1:3,DC2:1}];
    
  2. Open <cassandra_home/conf/cassandra-topology.properties in your text editor. Create an entry for each host. Put hosts 1-5 in DC1 and hosts 6-8 in DC2:

    10.1.2.1=DC1:rack1 #cas1
    10.1.2.2=DC1:rack1
    10.1.2.3=DC1:rack1
    10.1.2.4=DC1:rack1
    10.1.2.5=DC1:rack1
    10.2.5.9=DC2:rack1 #cas6
    10.2.3.4=DC2:rack1 #cas7
    10...