Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Overview of this book

Table of Contents (17 chapters)
Storm Blueprints: Patterns for Distributed Real-time Computation
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Establishing the architecture


We touched on Hadoop in the previous chapter, but we focused mainly on the map/reduce mechanism within Hadoop. In this chapter, we will do the opposite and focus on the Hadoop File System (HDFS) and Yet Another Resource Negotiator (YARN). We will leverage HDFS to stage the data, and leverage YARN to deploy the Storm framework that will host the topology.

The recent componentization within Hadoop allows any distributed system to use it for resource management. In Hadoop 1.0, resource management was embedded into the MapReduce framework as shown in the following diagram:

Hadoop 2.0 separates out resource management into YARN, allowing other distributed processing frameworks to run on the resources managed under the Hadoop umbrella. In our case, this allows us to run Storm on YARN as shown in the following diagram:

As shown in the preceding diagram, Storm fulfills the same function as MapReduce. It provides a framework for the distributed computation. In this specific...