RethinkDB architectural components
RethinkDB's architecture consists of various components such as cluster, query execution engine, filesystem storage, push changes (real-time feed), and of course RethinkDB client drivers.
Refer to the following diagram to understand the block-level components of RethinkDB:
Client drivers
RethinkDB provides official client drivers for Node.js, Python, Ruby, and Java and various non official community drivers which are listed at the official website (https://rethinkdb.com/docs/install-drivers/). At the time of writing this book, only these languages were supported. In this book, we will refer to code examples with Node.js.
RethinkDB query engine
RethinkDB query handler, as name implies, performs query execution and returns the response to the client. It does so by performing lot of internal operations such as sorting, indexing, finding the cluster, or merging data from various clusters. All of these operations are performed by RethinkDB query handler. We will look at this in detail in the upcoming section.
RethinkDB clusters
RethinkDB is a distributed database designed for high-performance, real-time operations. RethinkDB manages distribution by clustering (sharding or replication). RethinkDB clusters are just another instance of the main process of RethinkDB and store data. We will look at sharding and replication in detail in the upcoming section.
Pushing changes to a RethinkDB client
This is a revolutionary concept introduced by RethinkDB. Consider this scenario: you are developing an application for the stock market where there are too many changes in a given amount of time. Obviously, we are storing every entry in the database and making sure that other connected nodes or clients know about these changes.
In order to do so, the conventional way is to keep looking (polling) for the data in the particular collection or table in order to find some changes. This improves the latency and turnaround time of packets, and we all know that a network call in a wide area network (WAN) is really costly. An HTTP call in a WAN is really costly.
Then came something called socket. In this, we do the polling operation but from the socket layer, not the HTTP layer. Here, the size of network requests may get reduced, but still we do the polling.
Note
Socket.io is one of the popular projects available for real-time web development.
RethinkDB proposes a reverse approach of this: what about the database itself tells you:
Hey, there are some changes happen in stock value and here are the new and old value.
This is exactly what RethinkDB push changes (change feed in technical terms) does. Once you subscribe to a particular table to look for its changes, RethinkDB just keeps pushing the old and new values of changes to the connected client. By "connected client," I meant a RethinkDB client and not a web application client. The difference between polling and push changes is shown here:
So you will get the changes in the data in one of the RethinkDB clients, say Node.js. And then you can simply broadcast it over the network, using socket probably.
But why are we using socket when RethinkDB can provide us the changes in the data? Because RethinkDB provides it to the middle layer and not the client layer, having a client layer directly talk to the client can be risky. Hence it has not been allowed yet.
But the RethinkDB team is working on another project called Horizon, which solves the issue mentioned previously, to allow clients to communicate to the database using secure layer of the middle tier. We will look at Horizon in detail in Chapter 10, Using RethinkDB and Horizon.