Cassandra High Availability

With relational databases, we write different data entities in their own tables, and then we join them to form the desired view at query time. If we apply this idea to a database like Cassandra, we end up with a distributed join.

New Cassandra developers, especially those who come from a relational database background, are particularly prone to follow this pattern. In the previous chapter, we mentioned that denormalization is key to successful data modeling in Cassandra, and our discussion of secondary indices can help explain the reasons for this.

Note

If you find yourself querying multiple large tables, then joining them in your application based on some shared key, you are performing a distributed join. This should almost always be avoided in favor of a denormalized data model. The only exception is for very small lookup tables that can fit easily in memory. Otherwise, you should always write your data the way you intend to read it.

At this point you should be familiar enough...

Cassandra High Availability

By : Robbie Strickland

Cassandra High Availability

By: Robbie Strickland

Overview of this book

Related Content you might be interested in

Current Title:

Cassandra High Availability

Distributed joins

Note