Book Image

YARN Essentials

Book Image

YARN Essentials

Overview of this book

If you have a working knowledge of Hadoop 1.x but want to start afresh with YARN, this book is ideal for you. You will be able to install and administer a YARN cluster and also discover the configuration settings to fine-tune your cluster both in terms of performance and scalability. This book will help you develop, deploy, and run multiple applications/frameworks on the same shared YARN cluster.
Table of Contents (12 chapters)
Free Chapter
1
1. Need for YARN
9
9. YARN – Alternative Solutions
11
Index

Chapter 1. Need for YARN

YARN stands for Yet Another Resource Negotiator. YARN is a generic resource platform to manage resources in a typical cluster. YARN was introduced with Hadoop 2.0, which is an open source distributed processing framework from the Apache Software Foundation.

In 2012, YARN became one of the subprojects of the larger Apache Hadoop project. YARN is also coined by the name of MapReduce 2.0. This is since Apache Hadoop MapReduce has been re-architectured from the ground up to Apache Hadoop YARN.

Think of YARN as a generic computing fabric to support MapReduce and other application paradigms within the same Hadoop cluster; earlier, this was limited to batch processing using MapReduce. This really changed the game to recast Apache Hadoop as a much more powerful data processing system. With the advent of YARN, Hadoop now looks very different compared to the way it was only a year ago.

YARN enables multiple applications to run simultaneously on the same shared cluster and allows applications to negotiate resources based on need. Therefore, resource allocation/management is central to YARN.

YARN has been thoroughly tested at Yahoo! since September 2012. It has been in production across 30,000 nodes and 325 PB of data since January 2013.

Recently, Apache Hadoop YARN won the Best Paper Award at ACM Symposium on Cloud Computing (SoCC) in 2013!