Book Image

Building Python Real time Applications with Storm

Book Image

Building Python Real time Applications with Storm

Overview of this book

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.” At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Table of Contents (14 chapters)

Storm processes


We will start with Nimbus first, which is actually the entry-point daemon in Storm. Just to compare with Hadoop, Nimbus is actually the job tracker of Storm. Nimbus's job is to distribute code to all supervisor daemons of a cluster. So, when topology code is submitted, it actually reaches all physical machines in the cluster. Nimbus also monitors failure of supervisors. If a supervisor continues to fail, then Nimbus reassigns those workers' jobs to other workers of a different physical machine. The current version of Storm allows only one instance of the Nimbus daemon to run. Nimbus is also responsible for assigning tasks to supervisor nodes. If you lose Nimbus, the workers will still continue to compute. Supervisors will continue to restart workers as and when they die. Without Nimbus, a worker's task won't be reassigned to another machine worker within the cluster.

There is no alternative Storm process that will take over if Nimbus dies, and no process will even try to restart...