Book Image

YARN Essentials

Book Image

YARN Essentials

Overview of this book

Table of Contents (17 chapters)
YARN Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Need for YARN
9
YARN – Alternative Solutions
Index

Storm-YARN


Apache Storm is an open source distributed real-time computation system from Twitter.

Storm helps in processing unbounded streams of data in a reliable manner. Storm can be used with any programming language. Some of the most common use cases of Storm are real-time analytics, real-time machine learning, continuous computation, ETL, and many more.

Storm-YARN is a project from Yahoo that enables the Storm cluster to be deployed and managed by YARN. Earlier, a separate cluster was needed for Hadoop and Storm.

One major benefit that comes with this integration is elasticity. Batch processing (Hadoop MapReduce) is usually done on the basis of need, and real-time processing (Storm) is an ongoing processing. When the Hadoop cluster is idle, you can leverage it for any real-time processing work.

In a typical real-time processing use case, constant and predictable loads are very rare. Storm, therefore, will need more resources during peak time when the load is greater. At peak time, Storm...