Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 7. Storm on YARN – Low Latency Processing in Hadoop

Hadoop MapReduce builds on the concept of moving computation to data. Data is significantly larger than the instructions to manipulate it. The network is the slowest component in any distributed data processing system, so it is natural to move the smaller piece around, that is, the program itself. With assistance from the NameNode, Hadoop knows exactly how the data resides in a cluster of computers. It uses this data locality information to schedule tasks on appropriate nodes, putting in the best effort to locate the task very close to the data needed by the task.

In this chapter, we will discuss the opposite paradigm, that is, moving data to the compute, also known as the streaming paradigm. There are many frameworks that facilitate streaming, Apache Storm being a popular one. Apache Storm integrates with Hadoop YARN, bringing the streaming paradigm to Hadoop. In this chapter, we will cover the following topics:

  • Comparing and contrasting...