Book Image

Storm Real-time Processing Cookbook

By : Quinton Anderson
Book Image

Storm Real-time Processing Cookbook

By: Quinton Anderson

Overview of this book

<p>Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!<br />Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.<br /><br />The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.</p>
Table of Contents (16 chapters)
Storm Real-time Processing Cookbook
Credits
About the Author
About the Reviewers
www.packtpub.com
Preface
Index

Real-time online machine learning


The process of performing predictive analytics is largely iterative and interactive in nature; however, in all the previous examples, there is a definite distinction between the learning phase and the scoring phase within the life of the model. In the case of online learning algorithms, this line gets blurred. An online learning algorithm learns continuously through streams of updated training data. Algorithms are therefore said to be either batch-based or online. Note that, in either case, the algorithm can be real-time; however, in the batch-based model, a model is built in some offline batch process and is deployed into Storm for the purposes of real-time scoring. In the online case, the algorithm both learns and scores as it sees new data and is also deployed into Storm as the real-time processing engine.

This recipe implements an online Regression Perceptron. What this name simply means is that the algorithm learns in an online manner and the predictions...