Scaling Big Data with Hadoop and Solr, Second Edition

Cassandra is one of the most widely used distributed, fault-tolerant NoSQL databases. Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure. There are some interesting performance benchmarks published at Planet Cassandra (http://planetcassandra.org/NoSQL-performance-benchmarks/), which places Apache Cassandra as one of the fastest NoSQL databases among its competitors in terms of the throughput, load, and so on. Apache Cassandra allows the schema-ess storage of user information in its store called the Column Families pattern. For example, look at the data model for sales lead information as shown in the following screenshot:

This model, when transformed for the Cassandra store, becomes columnar storage. The following screenshot shows how this model would look using Apache Cassandra:

As one can see, the key here is the customer ID, and the value is a set of attributes/columns that vary for each row key. Further, columns...

Scaling Big Data with Hadoop and Solr, Second Edition

By : Hrishikesh Vijay Karambelkar

Scaling Big Data with Hadoop and Solr, Second Edition

By: Hrishikesh Vijay Karambelkar

Overview of this book

Related Content you might be interested in

Current Title:

Scaling Big Data with Hadoop and Solr, Second Edition

Apache Solr and Cassandra