Book Image

Learning Apache Cassandra - Second Edition

Book Image

Learning Apache Cassandra - Second Edition

Overview of this book

Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer. The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you’ll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you’ll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client. By the end of this book, you'll be fully equipped to build powerful, scalable Cassandra database layers for your applications.
Table of Contents (14 chapters)

Challenges of modern applications


Before we delve into the shortcomings of relational systems to handle big data, let's take a look at some of the challenges faced by modern web-facing and big data applications.

Later, this will give an insight into how NoSQL data stores or Cassandra, in particular, help solve these issues:

  • One of the most important challenges faced by a web-facing application is the ability to handle a large number of concurrent users. Think of a search engine such as Google, which handles millions of concurrent users at any given point of time, or a large online retailer. The response from these applications should be swift even as the number of users keeps on growing.
  • Modern applications need to be able to handle large amounts of data, which can scale to several petabytes of data and beyond. Consider a large social network with a few hundred million users:
    • Think of the amount of data generated in tracking and managing those users
    • Think of how this data can be used for analytics
  • Business-critical applications should continue running without much impact even when there is a system failure or multiple system failures (server failure, network failure, and so on). The applications should be able to handle failures gracefully without any data loss or interruptions.
  • These applications should be able to scale across multiple data centers and geographical regions to support customers from different regions around the world with minimum delay. Modern applications should be implementing fully distributed architectures and should be capable of scaling out horizontally to support any data size or any number of concurrent users.