Book Image

Building Python Real-Time Applications with Storm

By : Kartik Bhatnagar, Barry Hart
Book Image

Building Python Real-Time Applications with Storm

By: Kartik Bhatnagar, Barry Hart

Overview of this book

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.” At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Table of Contents (14 chapters)

Finding the top n ranked topics using Redis


The topology will compute a rolling ranking of the most popular words in the past 5 minutes. The word counts are stored in individual windows of 60 seconds in length. It consists of the following components:

  • Twitter stream spout (twitterstream.py): This reads tweets from the Twitter sample stream. This spout is unchanged from Chapter 4, Example Topology – Twitter.

  • Splitter bolt (splitsentence.py): This receives tweets and splits them into words. This is also identical to the one in Chapter 4, Example Topology – Twitter.

  • Rolling word count bolt (rollingcount.py): This receives words and counts the occurrences. The Redis keys look like twitter_word_count:<start time of current window in seconds>, and the values are stored in a hash using the following simple format:

    {
        "word1": 5,
        "word2", 3,
    }

    This bolt uses the Redis expireat command to discard old data after 5 minutes. These lines of code perform the key work:

          self.conn.zincrby(name...