Book Image

Building Python Real-Time Applications with Storm

By : Kartik Bhatnagar, Barry Hart
Book Image

Building Python Real-Time Applications with Storm

By: Kartik Bhatnagar, Barry Hart

Overview of this book

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.” At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Table of Contents (14 chapters)

Testing a bolt

Storm makes it easy to deploy and run Python topologies, but developing and testing them in Storm is challenging, whether running in standalone Storm or a full Storm deployment:

  • Storm launches programs on your behalf—not only your Python code but auxiliary Java processes as well

  • It controls the Python components' standard input and output channels

  • The Python programs must respond regularly to heartbeat messages or be shut down by Storm

This makes it difficult to debug Storm topologies using the typical tools and techniques used for other pieces of Python code, such as the common technique of running from the command line and debugging with pdb.

Petrel's mock module helps us with this. It provides a simple, standalone Python container for testing simple topologies and verifying that the expected results are returned.

In Petrel terms, a simple topology is one that only outputs to the default stream and has no branches or loops. The run_simple_topology() assumes that the first component...