Mastering Apache Cassandra

Mastering Apache Cassandra

By : Nishant Neeraj

Buy this Book

Mastering Apache Cassandra

By: Nishant Neeraj

Buy this Book

Overview of this book

Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data across multiple datacenters with lower latency rates. This book details these features that will guide you towards mastering the art of building high performing databases without compromising on performance. Mastering Apache Cassandra aims to give enough knowledge to enable you to program pragmatically and help you understand the limitations of Cassandra. You will also learn how to deploy a production setup and monitor it, understand what happens under the hood, and how to optimize and integrate it with other software. Mastering Apache Cassandra begins with a discussion on understanding Cassandra’s philosophy and design decisions while helping you understand how you can implement it to resolve business issues and run complex applications simultaneously. You will also get to know about how various components of Cassandra work with each other to give a robust distributed system. The different mechanisms that it provides to solve old problems in new ways are not as twisted as they seem; Cassandra is all about simplicity. Learn how to set up a cluster that can face a tornado of data reads and writes without wincing. If you are a beginner, you can use the examples to help you play around with Cassandra and test the water. If you are at an intermediate level, you may prefer to use this guide to help you dive into the architecture. To a DevOp, this book will help you manage and optimize your infrastructure. To a CTO, this book will help you unleash the power of Cassandra and discover the resources that it requires.

Mastering Apache Cassandra

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Quick Start

Introduction to Cassandra

A brief introduction to a data model

Installing Cassandra locally

CRUD with cassandra-cli

Cassandra in action

Summary

Cassandra Architecture

Problems in the RDBMS world

Enter NoSQL

Cassandra

Cassandra architecture

Summary

Design Patterns

The Cassandra data model

Patterns and antipatterns

Summary

Deploying a Cluster

Evaluating requirements

System configurations

The required software

Installing Cassandra

Configuring a Cassandra cluster

Authorization and authentication

Summary

Performance Tuning

Stress testing

Performance tuning

Summary

Managing a Cluster – Scaling, Node Repair, and Backup

Scaling

Replacing a node

Backup and restoration

Load balancing

Priam – managing large clusters on AWS

Summary

Monitoring

Cassandra JMX interface

Cassandra nodetool

DataStax OpsCenter

Nagios – monitoring and notification

Cassandra log

Troubleshooting

Summary

Integration

Using Hadoop

Hadoop and Cassandra

Cassandra with Hadoop MapReduce

Cassandra and Hadoop in action

Hadoop in Cassandra cluster

Integration with Pig

Cassandra and Solr

Summary

Introduction to CQL 3 and Cassandra 1.2

CQL – the Cassandra Query Language

CQL 3 for Thrift refugees

CQL 3 basics

What's new in Cassandra 1.2?

Support for programming languages

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

A brief introduction to a data model

Cassandra has three containers, one within another. The outermost container is Keyspace. You can think of Keyspace as a database in the RDBMS land. Next, you will see the column family, which is like a table. Within a column family are columns, and columns live under rows. Each row is identified by a unique row key, which is like the primary key in RDBMS.

The Cassandra data model

Things were pretty monotonous until now, as you already knew everything that we talked about from RDBMS. The difference is in the way Cassandra treats this data. Column families, unlike tables, can be schema free (schema optional). This means you can have different column names for different rows within the same column family. There may be a row that has user_name, age, phone_office, and phone_home, while another row can have user_name, age, phone_office, office_address, and email. You can store about two billion columns per row. This means it can be very handy to store time series data, such as tweets or comments on a blog post. The column name can be a timestamp of these events. In a row, these columns are sorted by natural order; therefore, we can access the time series data in a chronological or reverse chronological order, unlike RDBMS, where each row just takes the space as per the number of columns in it. The other difference is, unlike RDBMS, Cassandra does not have relations. This means relational logic will be needed to be handled at the application level. This means we may want to denormalize things because there is no join.

Rows are identified by a row key. These row keys act as partitioners. Rows are distributed across the cluster, creating effective auto-shading. Each server holds a range(s) of keys. So, if balanced, a server with more nodes will have a fewer number of rows per node. All these concepts will be repeated in detail in the later chapters.

Mastering Apache Cassandra

By : Nishant Neeraj

Mastering Apache Cassandra

By: Nishant Neeraj

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Apache Cassandra

A brief introduction to a data model