Book Image

Building Python Real time Applications with Storm

Book Image

Building Python Real time Applications with Storm

Overview of this book

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.” At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Table of Contents (14 chapters)

Storm-topology-specific terminologies


A topology is a logical separation of programming work into many small-scale processing units called spout and bolt, which is similar to MapReduce in Hadoop. A topology can be written in many languages, including Java, Python, and lot more supported languages. In visual depictions, a topology is shown as a graph of connecting spouts and bolts. Spouts and bolts execute tasks across the cluster. Storm has two modes of operation, called local mode and distributed mode:

  • In local mode, all processes of Storm and workers run within your code development environment. This is good for testing and development of topologies.

  • In distributed mode, Storm operates as a cluster of machines. When you submit topology code to the Nimbus, Nimbus takes care of distributing the code and allocating workers to run your topology based on your configuration.

In the following figure, we have purple bolts; these receive a tuple or records from the spout above them. A tuple supports...