Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Overview of this book

Table of Contents (17 chapters)
Storm Blueprints: Patterns for Distributed Real-time Computation
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


In this chapter, we saw a few different things. First, we saw the blueprint for converting a batch processing mechanism that leverages Pig into a real-time system that is implemented in Storm. We saw how a direct translation of that script would not work due to the limitations of joins in a real-time system, because traditional join operations require finite set of data. To overcome this problem, we used a shared state pattern with the forked streams.

Secondly, and perhaps most importantly, we examined Storm-YARN; it allows a user to reuse the Hadoop infrastructure to deploy Storm. Not only does this provide a means for existing Hadoop users to quickly transition to Storm, it also allows a user to capitalize on cloud mechanisms for Hadoop such as Amazon's Elastic Map Reduce (EMR). Using EMR, Storm can be deployed quickly to cloud infrastructure and scaled to meet demand.

Finally, as future work, the community is exploring methods to run Pig scripts directly on Storm. This would allow...