Book Image

Mastering Apache Cassandra - Second Edition

Book Image

Mastering Apache Cassandra - Second Edition

Overview of this book

Table of Contents (15 chapters)
Mastering Apache Cassandra Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Cassandra with Hadoop MapReduce


Cassandra provides built-in support for Hadoop. If you have ever written a MapReduce program, you will find out that writing a MapReduce task with Cassandra is quite similar to how one would write a MapReduce task for the data stored in HDFS. Cassandra supports input to Hadoop with ColumnFamilyInputFormat and output with the ColumnFamilyOutputFormat classes, respectively. Apart from these, you will need to put Cassandra-specific settings for Hadoop via ConfigHelper. These three classes are enough to get you started. Another class that might be worth looking at is BulkOutputFormat. All these classes are under the org.apache.cassandra.hadoop.* package.

To be able to compile the MapReduce code that uses Cassandra as data source or data sink, you must have cassandra-all.jar in your classpath. You will also need to make Hadoop to be able to see JARs in the Cassandra library. We will discuss this later in this chapter.

Let's understand the classes that we will be...