Book Image

Mastering RethinkDB

By : Shahid Shaikh
Book Image

Mastering RethinkDB

By: Shahid Shaikh

Overview of this book

RethinkDB has a lot of cool things to be excited about: ReQL (its readable,highly-functional syntax), cluster management, primitives for 21st century applications, and change-feeds. This book starts with a brief overview of the RethinkDB architecture and data modeling, and coverage of the advanced ReQL queries to work with JSON documents. Then, you will quickly jump to implementing these concepts in real-world scenarios, by building real-time applications on polling, data synchronization, share market, and the geospatial domain using RethinkDB and Node.js. You will also see how to tweak RethinkDB's capabilities to ensure faster data processing by exploring the sharding and replication techniques in depth. Then, we will take you through the more advanced administration tasks as well as show you the various deployment techniques using PaaS, Docker, and Compose. By the time you have finished reading this book, you would have taken your knowledge of RethinkDB to the next level, and will be able to use the concepts in RethinkDB to develop efficient, real-time applications with ease.
Table of Contents (16 chapters)
Mastering RethinkDB
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Chapter 3. Data Exploration Using RethinkDB

Data exploration is the process of analyzing and refactoring structured or non-structured data and is commonly done before going onto actual data analysis. Operations such as performing a duplicate cleanup and finding whitespace data can be done at the data exploration stage.

We can keep data exploration as the pre-emptive operation before performing heavy-cost operations such as running various batches and jobs, which is quite expensive in computing, and finding irrelevant data in that stage would be painful.

Data exploration can be very useful in various scenarios. Suppose you have large dataset of DNA diversion of people living in New York or terabytes of data from NASA about Mars' temperature records. There is a huge possibility that the data is error prone. So, instead of directly uploading terabytes of data to the program written in R, we can try to make the data less error prone, which will surely process faster results.

Concepts such as those...