Book Image

Building Python Real time Applications with Storm

Book Image

Building Python Real time Applications with Storm

Overview of this book

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.” At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Table of Contents (14 chapters)

Creating your first topology


Now, we'll create a Storm topology that breaks sentences into words and then counts the number of occurrences of each word. Implementing this topology in Storm requires the following components:

  • Sentence spout (randomsentence.py): A topology always begins with a spout; that's how data gets into Storm. The sentence spout will emit an infinite stream of sentences.

  • Splitter bolt (splitsentence.py): This receives sentences and splits them into words.

  • Word count bolt (wordcount.py): This receives words and counts the occurrences. For each word processed, output the word along with the number of occurrences.

The following figure shows how data flows through the topology:

Word count topology

Now that we've seen the basic data flow, let's implement the topology and see how it works.

Sentence spout

In this section, we implement a spout that generates random sentences. Enter this code in a file called randomsentence.py:

import time
import random

from petrel import storm
from petrel...