Mastering Apache Storm

By : Ankit Jain
Overview of this book

Apache Storm is a real-time Big Data processing framework that processes large amounts of data reliably, guaranteeing that every message will be processed. Storm allows you to scale your data as it grows, making it an excellent platform to solve your big data problems. This extensive guide will help you understand right from the basics to the advanced topics of Storm. The book begins with a detailed introduction to real-time processing and where Storm fits in to solve these problems. You’ll get an understanding of deploying Storm on clusters by writing a basic Storm Hello World example. Next we’ll introduce you to Trident and you’ll get a clear understanding of how you can develop and deploy a trident topology. We cover topics such as monitoring, Storm Parallelism, scheduler and log processing, in a very easy to understand manner. You will also learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm. With real-world examples and clear explanations, this book will ensure you will have a thorough mastery of Apache Storm. You will be able to use this knowledge to develop efficient, distributed real-time applications to cater to your business needs.
Table of Contents (19 chapters)
Title Page
About the Author
About the Reviewers
Customer Feedback


In this section, we covered how we can read Twitter tweets using the Twitter streaming API, how we can process the tweets to calculate the tweet text from inputted JSON records, calculate the sentiments of the tweets, and store the final output in HDFS.

With this, we come to the end of this book. Over the course of this book, we have come a long way from taking our first steps with Apache Storm to developing real-world applications with it. Here, we would like to summarize everything that we have learned.

We introduced you to the basic concepts and components of Storm, and covered how we can write and deploy/run the topology in both local and clustered mode. We also walked through the basic commands of Storm, and covered how we can modify the parallelism of the Storm topology at runtime. We also dedicated an entire chapter to monitoring Storm, which is an area often neglected during development, but is a critical part of any production setting. You also learned about Trident, which...