Book Image

Learning Apache Cassandra - Second Edition

Book Image

Learning Apache Cassandra - Second Edition

Overview of this book

Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer. The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you’ll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you’ll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client. By the end of this book, you'll be fully equipped to build powerful, scalable Cassandra database layers for your applications.
Table of Contents (14 chapters)

Chapter 10. How Cassandra Distributes Data

Much of Cassandra's power lies in the fact that it is a distributed database: rather than storing all of your data on a single machine, it is designed to distribute data across multiple machines. A distributed architecture is hugely beneficial for scalability, since you're not bound by the hardware capacity of a single machine; if you need more storage or more processing power, you can simply add more nodes to your Cassandra cluster. It's also a boon for availability: by storing multiple copies of your data on multiple machines, Cassandra is resilient to the failure of a particular node.

The beauty of a distributed database such as Cassandra is that, as application developers, we rarely need to think about the fact that we're working with data that's spread across multiple servers. We've spent the last nine chapters exploring a wide range of Cassandra's functionality, and the interfaces we've worked with never require us to explicitly account for...