Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Overview of this book

Table of Contents (17 chapters)
Storm Blueprints: Patterns for Distributed Real-time Computation
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Designing the topology for our use case


For this example, we will again use Trident and build on the topology that we constructed in the previous chapter. The Trident topology is depicted as follows:

The TwitterSpout performs the search against the Twitter API periodically, emitting the tweets that it returns into a Trident stream. The TweetSplitterFunction then parses the tweets and emits a tuple for each word in the tweets. The WordFrequencyFunction enriches each tuple with the count for that word from a random sample of the English language. Finally, we let Druid consume that information to perform the aggregations over time. Druid partitions the data into temporal slices and persists that data as described previously.

In this case, because the persistence mechanism is our means of addressing fault tolerance/system failure, the persistence mechanism should distribute storage and provide both consistency and high-availability. Additionally, Hadoop should be capable of using the persistence...