Book Image

Storm Real-time Processing Cookbook

By : Quinton Anderson
Book Image

Storm Real-time Processing Cookbook

By: Quinton Anderson

Overview of this book

<p>Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!<br />Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.<br /><br />The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.</p>
Table of Contents (16 chapters)
Storm Real-time Processing Cookbook
Credits
About the Author
About the Reviewers
www.packtpub.com
Preface
Index

Integrating the batch and real-time views


The final step to complete the big data architecture is largely complete already and is surprisingly simple, as is the case with all good functional style designs.

How to do it…

This recipe involves simply extending the existing TF-IDF DRPC query that we defined in Chapter 4, Distributed Remote Procedure Calls. We need three new state sources that represents the D, DF, and TF values computed in the Batch layer. We will combine the values from these states with the existing state before performing the final TF-IDF calculation.

  1. Start from the inside out by creating the combination function called BatchCombiner within the storm.cookbook.tfidf.function package and implement the logic to combine two versions of the same state. One version should be from the current hour, and the other from all the data prior to the current hour:

    public void execute(TridentTuple tuple, TridentCollector collector) {
        try {
         double d_rt = (double) tuple.getLongByField...