Storm Blueprints: Patterns for Distributed Real-time Computation

Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Overview of this book

Storm Blueprints: Patterns for Distributed Real-time Computation

Storm Blueprints: Patterns for Distributed Real-time Computation

Credits

About the Authors

About the Authors

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Distributed Word Count

Distributed Word Count

Introducing elements of a Storm topology – streams, spouts, and bolts

Introducing the word count topology data flow

Implementing the word count topology

Introducing parallelism in Storm

Understanding stream groupings

Guaranteed processing

Configuring Storm Clusters

Configuring Storm Clusters

Introducing the anatomy of a Storm cluster

Introducing the Storm technology stack

Installing Storm on Linux

Submitting topologies to a Storm cluster

Automating the cluster configuration

A rapid introduction to Puppet

Trident Topologies and Sensor Data

Trident Topologies and Sensor Data

Examining our use case

Introducing Trident topologies

Introducing Trident spouts

Introducing Trident operations – filters and functions

Introducing Trident aggregators – Combiners and Reducers

Introducing the Trident state

Executing the topology

Real-time Trend Analysis

Real-time Trend Analysis

Installing the required software

Introducing the sample application

Introducing the log analysis topology

The final topology

Running the log analysis topology

Real-time Graph Analysis

Real-time Graph Analysis

A brief introduction to graph databases

Software installation

Setting up Titan to use the Cassandra storage backend

Graph data model

Connecting to the Twitter stream

Twitter graph topology

Implementing GraphState

Implementing GraphFactory

Implementing GraphTupleProcessor

Putting it all together – the TwitterGraphTopology class

Querying the graph with Gremlin

Artificial Intelligence

Artificial Intelligence

Designing for our use case

Establishing the architecture

Implementing the architecture

Integrating Druid for Financial Analytics

Integrating Druid for Financial Analytics

Integrating a non-transactional system

Implementing the architecture

Executing the implementation

Examining the analytics

Natural Language Processing

Natural Language Processing

Motivating a Lambda architecture

Examining our use case

Realizing a Lambda architecture

Designing the topology for our use case

Implementing the design

Examining the analytics

Batch processing / historical analysis

Deploying Storm on Hadoop for Advertising Analysis

Deploying Storm on Hadoop for Advertising Analysis

Examining the use case

Establishing the architecture

Configuring the infrastructure

Deploying the analytics

Performing the analytics

Deploying the topology

Executing the topology

Storm in the Cloud

Storm in the Cloud

Introducing Amazon Elastic Compute Cloud (EC2)

Introducing Apache Whirr

Configuring a Storm cluster with Whirr

Introducing Whirr Storm

Introducing Vagrant

Creating Storm-provisioning scripts

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Introducing the word count topology data flow

Our word count topology (depicted in the following diagram) will consist of a single spout connected to three downstream bolts.

Word count topology

Sentence spout

The SentenceSpout class will simply emit a stream of single-value tuples with the key name "sentence" and a string value (a sentence), as shown in the following code:

{ "sentence":"my dog has fleas" }

To keep things simple, the source of our data will be a static list of sentences that we loop over, emitting a tuple for every sentence. In a real-world application, a spout would typically connect to a dynamic source, such as tweets retrieved from the Twitter API.

Introducing the split sentence bolt

The split sentence bolt will subscribe to the sentence spout's tuple stream. For each tuple received, it will look up the "sentence" object's value, split the value into words, and emit a tuple for each word:

{ "word" : "my" }
{ "word" : "dog" }
{ "word" : "has" }
{ "word" : "fleas" }

Introducing the word count bolt

The word count bolt subscribes to the output of the SplitSentenceBolt class, keeping a running count of how many times it has seen a particular word. Whenever it receives a tuple, it will increment the counter associated with a word and emit a tuple containing the word and the current count:

{ "word" : "dog", "count" : 5 }

Introducing the report bolt

The report bolt subscribes to the output of the WordCountBolt class and maintains a table of all words and their corresponding counts, just like WordCountBolt. When it receives a tuple, it updates the table and prints the contents to the console.