Storm Real-time Processing Cookbook

Storm Real-time Processing Cookbook

By : Quinton Anderson

Buy this Book

Storm Real-time Processing Cookbook

By: Quinton Anderson

Buy this Book

Overview of this book

Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use! Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation. The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.

Storm Real-time Processing Cookbook

Credits

About the Author

About the Reviewers

www.packtpub.com

Preface

Free Chapter

Setting Up Your Development Environment

Introduction

Setting up your development environment

Distributed version control

Creating a "Hello World" topology

Creating a Storm cluster – provisioning the machines

Creating a Storm cluster – provisioning Storm

Deriving basic click statistics

Unit testing a bolt

Implementing an integration test

Deploying to the cluster

Log Stream Processing

Introduction

Creating a log agent

Creating the log spout

Rule-based analysis of the log stream

Indexing and persisting the log data

Counting and persisting log statistics

Creating an integration test for the log stream cluster

Creating a log analytics dashboard

Calculating Term Importance with Trident

Introduction

Creating a URL stream using a Twitter filter

Deriving a clean stream of terms from the documents

Calculating the relative importance of each term

Distributed Remote Procedure Calls

Introduction

Using DRPC to complete the required processing

Integration testing of a Trident topology

Implementing a rolling window topology

Simulating time in integration testing

Polyglot Topology

Introduction

Implementing the multilang protocol in Qt

Implementing the SplitSentence bolt in Qt

Implementing the count bolt in Ruby

Defining the word count topology in Clojure

Integrating Storm and Hadoop

Introduction

Implementing TF-IDF in Hadoop

Persisting documents from Storm

Integrating the batch and real-time views

Real-time Machine Learning

Introduction

Implementing a transactional topology

Creating a Random Forest classification model using R

Operational classification of transactional streams using Random Forest

Creating an association rules model in R

Creating a recommendation engine

Real-time online machine learning

Continuous Delivery

Introduction

Setting up a CI server

Setting up system environments

Defining a delivery pipeline

Implementing automated acceptance testing

Storm on AWS

Introduction

Deploying Storm on AWS using Pallet

Setting up a Virtual Private Cloud

Deploying Storm into Virtual Private Cloud using Vagrant

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introduction

This chapter will present the implementation of a very well-known data processing algorithm, Term Frequency–Inverse Document Frequency (TF-IDF), using Storm's Trident API. TF-IDF is a numerical statistic that reflects how important a word is to a document within a collection of documents. This is often a key concern in search engines but is also an important starting point in sentiment mining, as the trend of the important words within textual content can be an extremely useful predictor or an analytical tool.

Tip

TF-IDF drives many search engines, such as Apache Lucence. If you want the details of how it is used in this context, please read the documentation for the Similarity class in Apache Lucence at http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/search/Similarity.html.

According to the Storm project wiki (https://github.com/nathanmarz/storm/wiki/Trident-tutorial), Trident is a new high-level abstraction for doing real-time computing on top of Storm. It allows...

Storm Real-time Processing Cookbook

By : Quinton Anderson

Storm Real-time Processing Cookbook

By: Quinton Anderson

Overview of this book

Related Content you might be interested in

Current Title:

Storm Real-time Processing Cookbook

Introduction

Tip