Book Image

Cassandra High Performance Cookbook

By : Edward Capriolo
Book Image

Cassandra High Performance Cookbook

By: Edward Capriolo

Overview of this book

<p>Apache Cassandra is a fault-tolerant, distributed data store which offers linear scalability allowing it to be a storage platform for large high volume websites. <br /><br />This book provides detailed recipes that describe how to use the features of Cassandra and improve its performance. Recipes cover topics ranging from setting up Cassandra for the first time to complex multiple data center installations. The recipe format presents the information in a concise actionable form.<br /><br />The book describes in detail how features of Cassandra can be tuned and what the possible effects of tuning can be. Recipes include how to access data stored in Cassandra and use third party tools to help you out. The book also describes how to monitor and do capacity planning to ensure it is performing at a high level. Towards the end, it takes you through the use of libraries and third party applications with Cassandra and Cassandra integration with Hadoop.</p>
Table of Contents (20 chapters)
Cassandra High Performance Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Copying the data directory to new hardware


The Bootstrap and Anti Entropy Repair processes are generally the best ways to move data to new nodes. In some cases, it is more efficient to move data around with a file copy tool such as rsync. This method is efficient when doing a one-to-one move from old hardware to new hardware.

Getting ready

For this example, assume the node cassandra05 is being replaced by cassandra05-new and the Cassandra data directory is /var/lib/cassandra. This recipe requires an SSH Server and SSH Client, but any method of transferring binary data such as FTP is sufficient.

How to do it...

  1. Create an executable script /root/sync.sh that uses the rsync command:

    nohup rsync -av --delete  --progress /v
    ar/lib/cassandra/data \ root@cassandra05-new:/var/lib/cassandra/ 2> /tmp/sync.err \
    1> /tmp/sync.out &
    
    $ chmod a+x /root/sync.sh
    $ sh /root/sync.sh
    
  2. On the source server, cassandra05, stop the Cassandra process and run the sync again. It will take much less time than...