Book Image

Storm Real-time Processing Cookbook

By : Quinton Anderson
Book Image

Storm Real-time Processing Cookbook

By: Quinton Anderson

Overview of this book

<p>Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!<br />Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.<br /><br />The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.</p>
Table of Contents (16 chapters)
Storm Real-time Processing Cookbook
Credits
About the Author
About the Reviewers
www.packtpub.com
Preface
Index

Operational classification of transactional streams using Random Forest


Now that you have a built classification model, we need to implement an operational topology that leverages this model in order to perform classification as part of a larger operational data pipeline. I would like to draw a distinction between an operational data pipeline and an operational process. I will talk about operational process as an architecture concern that involves potentially many system layers. These may include ERP, CRM, core processing engine, and so on. An operational process is, therefore, positioned at the solution architecture level, and, as stated in the previous recipe, it typically won't include an analytics platform, such as R. A data pipeline is applicable at Trident's level of abstraction. Trident, in effect, allows you to define a streaming data pipeline. It abstracts away that state and planning in order to achieve this pipeline at scale and in parallel.

This recipe will therefore present operational...