Building Python Real-Time Applications with Storm

By: Kartik Bhatnagar, Barry Hart

Overview of this book

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.” At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Chapter 3. Introducing Petrel

As discussed in Chapter 1, Getting Acquainted with Storm, Storm is a platform for processing large amounts of data in real time. Storm applications are often written in Java, but Storm supports other languages as well, including Python. While the concepts are similar across languages, the details vary by language. In this chapter, we'll get our first hands-on experience using Storm with Python. First, you'll learn about a Python library called Petrel, which is necessary for creating topologies in Python. Next, we'll set up our Python/Storm development environment. Then, we'll take a close look at a working Storm topology written in Python. Finally, we'll run the topology and you will learn some key techniques to ease the process of developing and debugging topologies. After you complete this chapter, you'll have a good high-level understanding of developing basic Storm topologies. In this chapter, we will cover these topics:

  • What is Petrel?

  • Installing Petrel

