Mastering RethinkDB

Mastering RethinkDB

By : Shahid Shaikh

Buy this Book

Mastering RethinkDB

By: Shahid Shaikh

Buy this Book

Overview of this book

RethinkDB has a lot of cool things to be excited about: ReQL (its readable,highly-functional syntax), cluster management, primitives for 21st century applications, and change-feeds. This book starts with a brief overview of the RethinkDB architecture and data modeling, and coverage of the advanced ReQL queries to work with JSON documents. Then, you will quickly jump to implementing these concepts in real-world scenarios, by building real-time applications on polling, data synchronization, share market, and the geospatial domain using RethinkDB and Node.js. You will also see how to tweak RethinkDB's capabilities to ensure faster data processing by exploring the sharding and replication techniques in depth. Then, we will take you through the more advanced administration tasks as well as show you the various deployment techniques using PaaS, Docker, and Compose. By the time you have finished reading this book, you would have taken your knowledge of RethinkDB to the next level, and will be able to use the concepts in RethinkDB to develop efficient, real-time applications with ease.

Mastering RethinkDB

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

The RethinkDB Architecture and Data Model

RethinkDB architectural components

Query execution in RethinkDB

Filesystem and data storage

Sharding and replication

Automatic failover handling in RethinkDB

About voting replicas

The RethinkDB data model

Constraints and limitation in RethinkDB

Summary

RethinkDB Query Language

Embedding ReQL in a programming language

ReQL queries are chainable

ReQL queries are executed on a server

Performing conditional queries

Performing string operations

Performing MapReduce operations

Calling HTTP APIs using ReQL

Handling binary objects

Performing JOINS

Accessing changefeed (real-time feed) in RethinkDB

Performing geolocation operations

Performing administrative operations

Summary

Data Exploration Using RethinkDB

Generating mock data

Executing data exploration use cases

Summary

Performance Tuning in RethinkDB

Clustering

Creating and handling a RethinkDB cluster

Securing our RethinkDB cluster

Executing ReQL queries in a cluster

Performing replication of tables in RethinkDB

Sharding the table to scale the database

Running a RethinkDB proxy node

Optimizing query performance

Summary

Administration and Troubleshooting Tasks in RethinkDB

Understanding access controls and permission in RethinkDB

Failover handling in RethinkDB

Performing a manual and automatic backup in RethinkDB

Data import and export in RethinkDB

Crash recovery in RethinkDB

Using third-party tools

Summary

RethinkDB Deployment

Deploying RethinkDB using PaaS services

Deploying RethinkDB using Docker

Deploying RethinkDB on a standalone server

Summary

Extending RethinkDB

Integrating RethinkDB with ElasticSearch

Integrating RethinkDB with RabbitMQ

Understanding the RethinkDB protocol

Third-party libraries and tools

Summary

Full Stack Development with RethinkDB

Project structure

Data modeling for our application

Creating a Node.js server and routes

Integrating RethinkDB with Node.js

Integrating AngularJS in the frontend

Socket.io integration for message broadcasting

Summary

Polyglot Persistence Using RethinkDB

Introducing Polyglot Persistence

Using the RethinkDB changefeed as a Polyglot agent

Developing a proof-of-concept application with MongoDB and MySQL

Developing the Polyglot agent

Developing event consumers

Running the app

Further improvements

Summary

Using RethinkDB and Horizon

Workings of Horizon

Installing and configuring Horizon

Developing a simple web application using Horizon

Horizon user management

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Query execution in RethinkDB

RethinkDB query engine is a very critical and important part of RethinkDB. RethinkDB performs various computations and internal logic operations to maintain high performance along with good throughput of the system.

Refer to the following diagram to understand query execution:

RethinkDB, upon arrival of a query, divides it into various stacks. Each stack contains various methods and internal logic to perform its operation. Each stack consists of various methods, but there are three core methods that play key roles:

The first method decides how to execute the query or subset of the query on each server in a particular cluster
The second method decides how to merge the data coming from various clusters in order to make sense of it
The third method, which is very important, deals with transmission of that data in streams rather than as a whole

To speed up the process, these stacks are transported to every related server and each server begins to evaluate it in parallel to other servers. This process runs recursively in order to merge the data to stream to the client.

The stack in the node grabs the data from the stack after it and performs its own method of execution and transformation. The data from each server is then combined into a single result set and streamed to the client.

In order to speed up the process and maintain high performance, every query is completely parallelized across various relevant clusters. Thus, every cluster then performs the query execution and the data is again merged together to make a single result set.

RethinkDB query engine maintains efficiency in the process too; for example, if a client only requests a certain result that is not in a shared or replicated server, it will not execute the parallel operation and just return the result set. This process is also referred to as lazy execution.

To maintain concurrency and high performance of query execution, RethinkDB uses block-level Multiversion Concurrency Control (MVCC). If one user is reading some data while other users are writing on it, there is a high chance of inconsistent data, and to avoid that we use a concurrency control algorithm. One of the simplest and commonly used methods method by SQL databases is to lock the transaction, that is, make the user wait if a write operation is being performed on the data. This slows down the system, and since big data promises fast reading time, this simply won't work.

Multiversion concurrency control takes a different approach. Here each user will see the snapshot of the data (that is, child copies of master data), and if there are some changes going on in the master copy, then the child copies or snapshot will not get updated until the change has been committed:

RethinkDB does use block-level MVCC and this is how it works. Whenever there is any update or write operation being performed during the read operation, RethinkDB takes a snapshot of each shard and maintains a different version of a block to make sure every read and write operation works in parallel. RethinkDB does use exclusive locks on block level in case of multiple updates happening on the same document. These locks are very short in duration because they all are cached; hence it always seems to be lock-free.

RethinkDB provides atomicity of data as per the JSON document. This is different from other NoSQL systems; most NoSQL systems provide atomicity to each small operation done on the document before the actual commit. RethinkDB does the opposite, it provides atomicity to a document no matter what combination of operations is being performed.

For example, a user may want to read some data (say, the first name from one document), change it to uppercase, append the last name coming from another JSON document, and then update the JSON document. All of these operations will be performed automatically in one update operation.

RethinkDB limits this atomicity to a few operations. For example, results coming from JavaScript code cannot be performed atomically. The result of a subquery is also not atomic. Replace cannot be performed atomically.

Mastering RethinkDB

By : Shahid Shaikh

Mastering RethinkDB

By: Shahid Shaikh

Overview of this book

Related Content you might be interested in

Current Title:

Mastering RethinkDB

Query execution in RethinkDB