Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Cassandra Design Patterns
  • Table Of Contents Toc
Cassandra Design Patterns

Cassandra Design Patterns - Second Edition

By : Rajanarayanan Thottuvaikkatumana
3 (2)
close
close
Cassandra Design Patterns

Cassandra Design Patterns

3 (2)
By: Rajanarayanan Thottuvaikkatumana

Overview of this book

If you are new to Cassandra but well-versed in RDBMS modeling and design, then it is natural to model data in the same way in Cassandra, resulting in poorly performing applications and losing the real purpose of Cassandra. If you want to learn to make the most of Cassandra, this book is for you. This book starts with strategies to integrate Cassandra with other legacy data stores and progresses to the ways in which a migration from RDBMS to Cassandra can be accomplished. The journey continues with ideas to migrate data from cache solutions to Cassandra. With this, the stage is set and the book moves on to some of the most commonly seen problems in applications when dealing with consistency, availability, and partition tolerance guarantees. Cassandra is exceptionally good at dealing with temporal data and patterns such as the time-series pattern and log pattern, which are covered next. Many NoSQL data stores fail miserably when a huge amount of data is read for analytical purposes, but Cassandra is different in this regard. Keeping analytical needs in mind, you’ll walk through different and interesting design patterns. No theoretical discussions are complete without a good set of use cases to which the knowledge gained can be applied, so the book concludes with a set of use cases you can apply the patterns you’ve learned.
Table of Contents (9 chapters)
close
close

Chapter 1. Co-existence Patterns

 

"It's coexistence or no existence"

 
 --Bertrand Russell

Relational Database Management Systems (RDBMS) have been pervasive since the '70s. It is very difficult to find an organization without any RDBMS in their solution stack. Huge efforts have gone into the standardization of RDBMS. Because of that, if you are familiar with one RDBMS, switching over to another will not be a big problem. You will remain in the same paradigm without any major shifts. Pretty much all the RDBMS vendors offer a core set of features with standard interfaces and then include their own value-added features on top of it. There is a standardized language to interact with RDBMS called Structured Query Language (SQL). The same queries written against one RDBMS will work without significant changes in another RDBMS. From a skill set perspective, this is a big advantage because you need not learn and relearn new dialects of these query languages as and when the products evolve. These enable the migration from one RDBMS to another RDBMS, which is a painless task. Many application designers designed the applications in an RDBMS agnostic way. In other words, the applications will work with multiple RDBMS. Just change some configuration file properties of the application, and it will start working with a different but supported RDBMS. Many software products are designed to support multiple RDBMS through their configuration file settings to suit the needs of the customers' preferred choice of RDBMS.

Mostly in RDBMS, a database schema organizes objects such as tables, views, indexes, stored procedures, sequences, and so on, into a logical group. Structured and related data is stored in tables as rows and columns. The primary key in a table uniquely identifies a row. There is a very strong theoretical background in the way data is stored in a table.

A table consists of rows and columns. Columns contain the fields, and rows contain the values of data. Rows are also called records or tuples. Tuple calculus, which was introduced by Edgar F. Codd as part of the relational model, serves as basis for the structured query language or SQL for this type of data model. Redundancy is avoided as much as possible. Wikipedia defines database normalization as follows:

"Database normalization is the process of organizing the attributes and tables of a relational database to minimize data redundancy."

Since the emphasis is on avoiding redundancy, related data is spread across multiple tables, and they are joined together with SQL to present data in various application contexts. Multiple indexes that may be defined on various columns in a table can help data retrieval, sorting needs, and maintaining data integrity.

In the recent years, the amount of data that is being generated by various applications is really huge and the traditional RDBMS have started showing their age. Most of the RDBMS were not able to ingest various types of data into their schema. When the data starts flowing in quick succession, traditional RDBMS often become bottlenecks. When data is written into the RDBMS data stores in such speed, in a very short period of time, the need to add more nodes into the RDBMS cluster becomes necessary. The SQL performance degradation happens on distributed RDBMS. In other words, as we enter the era of big data, RDBMS could not handle the three Vs of data: Volume, Variety, and Velocity of data.

Many RDBMS vendors came up with solutions for handling the three Vs of data, but these came with a huge cost. The cost involved in the software licensing, the sophisticated hardware required for that, and the related eco-system of building a fault-tolerant solution stack, started affecting the bottom line in a big way. New generation Internet companies started thinking of different solutions to solve this problem, and very specialized data stores started coming up from these organizations and open source communities based on some of the popular research papers. These data stores are generally termed as NoSQL data stores, and they started addressing very specific data storage and retrieval needs. Cassandra is one of the highly successful NoSQL data stores, which has a very good similarity with traditional RDBMS. The advantage of this similarity comes in handy when Cassandra is adopted by an enterprise. The abstractions of a typical RDBMS and Cassandra have a few similarities. Because of this, new users can relate things to RDBMS and Cassandra. From a logical perspective Cassandra tables have a similarity with RDBMS-based tables in the view of the users, even though the underlying structures of these tables are totally different. Because of this, Cassandra is the best fit to be deployed along with the traditional RDBMS to solve some of the problems that RDBMS is not able to handle.

The caveat here is that because of the similarity of RDBMS tables and Cassandra column families (also known as Cassandra tables) in the view of the end users, many users and data modelers try to use Cassandra in exactly the same way as an RDBMS schema is being modeled, used, and is getting into the serious deployment issues. How do you prevent such pitfalls? At the outset, Cassandra may look like a traditional RDBMS data store. But the fact is that it is not the same. The key here is to understand the differences from a theoretical perspective as well as in a practical perspective, and follow the best practices prescribed by the creators of Cassandra.

Tip

In Cassandra, the terms "column family" and "table" are synonymous. The Cassandra Query Language (CQL) command syntax uses the term "table."

Why can Cassandra be used along with other RDBMS? The answer to that lies in the limitations of RDBMS. Some of the obvious ones are cost savings, the need to scale out, handling high-volume traffic, complex queries slowing down response times, the data types are getting complex, and the list goes on and on. The most important aspect of the need for Cassandra to coexist with legacy RDBMS is that you need to preserve the investments made already and make sure that the current applications are working without any problems. So, you should protect your investments, make your future investments in a smart NoSQL store such as Cassandra, and follow the one-step-at-a-time approach.

Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Cassandra Design Patterns
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon