Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By : Hrishikesh Vijay Karambelkar
Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By: Hrishikesh Vijay Karambelkar

Overview of this book

Table of Contents (13 chapters)
Scaling Big Data with Hadoop and Solr Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Apache Solr and Cassandra


Cassandra is one of the most widely used distributed, fault-tolerant NoSQL databases. Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure. There are some interesting performance benchmarks published at Planet Cassandra (http://planetcassandra.org/NoSQL-performance-benchmarks/), which places Apache Cassandra as one of the fastest NoSQL databases among its competitors in terms of the throughput, load, and so on. Apache Cassandra allows the schema-ess storage of user information in its store called the Column Families pattern. For example, look at the data model for sales lead information as shown in the following screenshot:

This model, when transformed for the Cassandra store, becomes columnar storage. The following screenshot shows how this model would look using Apache Cassandra:

As one can see, the key here is the customer ID, and the value is a set of attributes/columns that vary for each row key. Further, columns...