Book Image

Getting Started with CockroachDB

By : Kishen Das Kondabagilu Rajanna
Book Image

Getting Started with CockroachDB

By: Kishen Das Kondabagilu Rajanna

Overview of this book

Getting Started with CockroachDB will introduce you to the inner workings of CockroachDB and help you to understand how it provides faster access to distributed data through a SQL interface. The book will also uncover how you can use the database to provide solutions where the data is highly available. Starting with CockroachDB's installation, setup, and configuration, this SQL book will familiarize you with the database architecture and database design principles. You'll then discover several options that CockroachDB provides to store multiple copies of your data to ensure fast data access. The book covers the internals of CockroachDB, how to deploy and manage it on the cloud, performance tuning to get the best out of CockroachDB, and how to scale data across continents and serve it locally. In addition to this, you'll get to grips with fault tolerance and auto-rebalancing, how indexes work, and the CockroachDB Admin UI. The book will guide you in building scalable cloud services on top of CockroachDB, covering administrative and security aspects and tips for troubleshooting, performance enhancements, and a brief guideline on migrating from traditional databases. By the end of this book, you'll have gained sufficient knowledge to manage your data on CockroachDB and interact with it from your application layer.
Table of Contents (17 chapters)
1
Section 1: Getting to Know CockroachDB
4
Section 2: Exploring the Important Features of CockroachDB
9
Section 3: Working with CockroachDB
Appendix: Bibliography and Additional Resources

CAP theorem

Eric A. Brewer gave a keynote talk in 2000 titled Towards Robust Distributed Systems at a symposium on Principles of Distributed Computing, summarizing his years of learning about distributed systems. Brewer talked about key aspects of a distributed system: consistency, availability, and tolerance toward network partition. Consistency refers to the fact that every read should see the data from the most recent write; otherwise, it should error out. Availability means every requested read or write should receive a non-error response. Partition tolerance indicates that the system should continue to serve, irrespective of delays and communication failures between nodes in the system. Consistency, Availability, and Partition Tolerance (CAP) theorem claims that, at most, you can only have two of these three properties in a distributed system.

Consistency and partition tolerance (CP)

A CP database provides consistency and partition tolerance but cannot provide availability. This is also called a CAP-consistent system. Let's understand this by looking at an example:

Figure 1.7 – CP system

Figure 1.7 – CP system

Let's consider the system shown in the preceding diagram, where two servers are serving read and write traffic. For this example, let's say writes only land on Server 1 and reads only land on Server 2. So long as Server 1 can talk to Server 2, all the writes that come to Server 1 can be propagated synchronously to Server 2. This ensures that any reads that come to Server 2 are always consistent, which means they see the latest data written by the latest write in Server 1:

Figure 1.8 – CP system during a communication failure

Figure 1.8 – CP system during a communication failure

Now, let's say that, as shown in the preceding diagram, the communication between Server 1 and Server 2 has broken down and now Server 1 is no longer able to propagate the writes synchronously. This results in partitioning. Since the data cannot be propagated between the two servers, read or write traffic cannot be served until we resolve the partition issue as we have to ensure data consistency.

Some of the most popular databases that have CP characteristics are HBase, Couchbase, and MongoDB. CockroachDB also falls into this category.

Availability and partition tolerance (AP)

In this case, a database is guaranteed to always be available and it can tolerate partitioning, but at the cost of consistency. This is also known as a CAP-available system. Here, the application is expected to deal with data consistency:

Figure 1.9 – AP system during a communication failure

Figure 1.9 – AP system during a communication failure

Similar to the previous example, if the communication between Server 1 and Server 2 breaks down, Server 1 and Server 2 continue to serve the traffic but reads to Server 1 and Server 2 might return different versions of the data, based on when the communication has failed and whether there was any change to that data, after the communication failure. Cassandra, Riak, and CouchDB are popular examples of AP databases.

Consistency and availability (CA)

In the case of a CA database, the system cannot tolerate partitioning but can guarantee consistency and availability. Traditional databases with single-server deployments with no replication or slaves can be classified as CA. Now, many traditional RDBMS databases can be configured in various ways to have CA, CP, or AP as desired.