Mastering Apache Cassandra 3.x - Third Edition

By : Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj

Mastering Apache Cassandra 3.x - Third Edition

By: Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj

Overview of this book

With ever-increasing rates of data creation, the demand for storing data fast and reliably becomes a need. Apache Cassandra is the perfect choice for building fault-tolerant and scalable databases. Mastering Apache Cassandra 3.x teaches you how to build and architect your clusters, configure and work with your nodes, and program in a high-throughput environment, helping you understand the power of Cassandra as per the new features. Once you’ve covered a brief recap of the basics, you’ll move on to deploying and monitoring a production setup and optimizing and integrating it with other software. You’ll work with the advanced features of CQL and the new storage engine in order to understand how they function on the server-side. You’ll explore the integration and interaction of Cassandra components, followed by discovering features such as token allocation algorithm, CQL3, vnodes, lightweight transactions, and data modelling in detail. Last but not least you will get to grips with Apache Spark. By the end of this book, you’ll be able to analyse big data, and build and manage high-performance databases for your application.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Quick Start

Introduction to Cassandra

Installation

Configuration

Starting Cassandra

Cassandra Cluster Manager

A quick introduction to the data model

Shutting down Cassandra

Summary

Cassandra Architecture

Why was Cassandra created?

Cassandra's ring architecture

Cassandra's write path

Cassandra's read path

On-disk storage

Additional components of Cassandra

Summary

Effective CQL

An overview of Cassandra data modeling

cqlsh

Getting started with CQL

Summary

Configuring a Cluster

Evaluating instance requirements

Operating system optimizations

Configuring the JVM

Configuring Cassandra

Managing a deployment pipeline

Summary

Performance Tuning

Cassandra-Stress

Write performance

Read performance

Other performance considerations

Summary

Managing a Cluster

Revisiting nodetool

Scaling up

Scaling down

Backing up and restoring data

Maintenance

Summary

Monitoring

Summary

Application Development

Getting started

Building a Java application

Summary

Integration with Apache Spark

Spark

PySpark

SparkR

RStudio

Jupyter

PYSpark through Juypter

Summary

References

Chapter 1 – Quick Start

Chapter 2 – Cassandra Architecture

Chapter 3 – Effective CQL

Chapter 4 – Configuring a Cluster

Chapter 5 – Performance Tuning

Chapter 6 – Managing a Cluster

Chapter 7 – Monitoring

Chapter 8 – Application Development

Chapter 9 – Integration with Apache Spark

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

PYSpark through Juypter

If Spark is already installed on the machine and SPARK_HOME is set, then the findspark pip package will get information related to the installed Spark. It will then connect Jupyter to the Spark installation with this package, which needs to be installed as follows:

pip install findspark

Otherwise, pip would not have the PySpark package installed by default. Hence, for using PySpark through Jupyter, it is mandatory to install it with the following command:

pip install pyspark

For example, a business wants to know the total number of orders counted by user. As Cassandra doesn't have an aggregation ability, Spark gives us the ability to do all of the required transformation along with sorting for a cleaner report. Setting a custom Spark and Cassandra config after startup to Jupyter Notebook is done as follows:

import os
import sys
import findspark

findspark...

Mastering Apache Cassandra 3.x - Third Edition

By : Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj

Mastering Apache Cassandra 3.x - Third Edition

By: Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Apache Cassandra 3.x - Third Edition

Seven NoSQL Databases in a Week

Learning Apache Cassandra