Book Image

Mastering RethinkDB

By : Shahid Shaikh
Book Image

Mastering RethinkDB

By: Shahid Shaikh

Overview of this book

RethinkDB has a lot of cool things to be excited about: ReQL (its readable,highly-functional syntax), cluster management, primitives for 21st century applications, and change-feeds. This book starts with a brief overview of the RethinkDB architecture and data modeling, and coverage of the advanced ReQL queries to work with JSON documents. Then, you will quickly jump to implementing these concepts in real-world scenarios, by building real-time applications on polling, data synchronization, share market, and the geospatial domain using RethinkDB and Node.js. You will also see how to tweak RethinkDB's capabilities to ensure faster data processing by exploring the sharding and replication techniques in depth. Then, we will take you through the more advanced administration tasks as well as show you the various deployment techniques using PaaS, Docker, and Compose. By the time you have finished reading this book, you would have taken your knowledge of RethinkDB to the next level, and will be able to use the concepts in RethinkDB to develop efficient, real-time applications with ease.
Table of Contents (16 chapters)
Mastering RethinkDB
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

RethinkDB architectural components


RethinkDB's architecture consists of various components such as cluster, query execution engine, filesystem storage, push changes (real-time feed), and of course RethinkDB client drivers.

Refer to the following diagram to understand the block-level components of RethinkDB:

Client drivers

RethinkDB provides official client drivers for Node.js, Python, Ruby, and Java and various non official community drivers which are listed at the official website (https://rethinkdb.com/docs/install-drivers/). At the time of writing this book, only these languages were supported. In this book, we will refer to code examples with Node.js.

RethinkDB query engine

RethinkDB query handler, as name implies, performs query execution and returns the response to the client. It does so by performing lot of internal operations such as sorting, indexing, finding the cluster, or merging data from various clusters. All of these operations are performed by RethinkDB query handler. We will look at this in detail in the upcoming section.

RethinkDB clusters

RethinkDB is a distributed database designed for high-performance, real-time operations. RethinkDB manages distribution by clustering (sharding or replication). RethinkDB clusters are just another instance of the main process of RethinkDB and store data. We will look at sharding and replication in detail in the upcoming section.

Pushing changes to a RethinkDB client

This is a revolutionary concept introduced by RethinkDB. Consider this scenario: you are developing an application for the stock market where there are too many changes in a given amount of time. Obviously, we are storing every entry in the database and making sure that other connected nodes or clients know about these changes.

In order to do so, the conventional way is to keep looking (polling) for the data in the particular collection or table in order to find some changes. This improves the latency and turnaround time of packets, and we all know that a network call in a wide area network (WAN) is really costly. An HTTP call in a WAN is really costly.

Then came something called socket. In this, we do the polling operation but from the socket layer, not the HTTP layer. Here, the size of network requests may get reduced, but still we do the polling.

Note

Socket.io is one of the popular projects available for real-time web development.

RethinkDB proposes a reverse approach of this: what about the database itself tells you:

Hey, there are some changes happen in stock value and here are the new and old value.

This is exactly what RethinkDB push changes (change feed in technical terms) does. Once you subscribe to a particular table to look for its changes, RethinkDB just keeps pushing the old and new values of changes to the connected client. By "connected client," I meant a RethinkDB client and not a web application client. The difference between polling and push changes is shown here:

So you will get the changes in the data in one of the RethinkDB clients, say Node.js. And then you can simply broadcast it over the network, using socket probably.

But why are we using socket when RethinkDB can provide us the changes in the data? Because RethinkDB provides it to the middle layer and not the client layer, having a client layer directly talk to the client can be risky. Hence it has not been allowed yet.

But the RethinkDB team is working on another project called Horizon, which solves the issue mentioned previously, to allow clients to communicate to the database using secure layer of the middle tier. We will look at Horizon in detail in Chapter 10, Using RethinkDB and Horizon.