Book Image

Building Python Real-Time Applications with Storm

By : Kartik Bhatnagar, Barry Hart
Book Image

Building Python Real-Time Applications with Storm

By: Kartik Bhatnagar, Barry Hart

Overview of this book

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.” At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Table of Contents (14 chapters)

Profiling your topology's performance


Performance can be a concern for any application. This is true for Storm topologies as well, perhaps more so.

When you're trying to push a lot of data through a topology, raw performance is certainly a concern—faster components means that more data can be processed. But it's also important to understand the tuple processing performance of individual components. This information can be used in two ways.

The first is knowing which components are slower, because this tells you where to focus your attention if you are trying to make the code faster. Once you know which component (or components) is slow, you can use tools such as the Python cProfile module (http://pymotw.com/2/profile/) and the line profiler (https://github.com/rkern/line_profiler) to understand where the code is spending most of its time.

Even after profiling, some components will still be faster than others. In this case, understanding the relative performance between components can help you...