Book Image

YARN Essentials

Book Image

YARN Essentials

Overview of this book

If you have a working knowledge of Hadoop 1.x but want to start afresh with YARN, this book is ideal for you. You will be able to install and administer a YARN cluster and also discover the configuration settings to fine-tune your cluster both in terms of performance and scalability. This book will help you develop, deploy, and run multiple applications/frameworks on the same shared YARN cluster.
Table of Contents (12 chapters)
Free Chapter
1
1. Need for YARN
9
9. YARN – Alternative Solutions
11
Index

MRv1 versus MRv2


MRv1 (MapReduce version 1) is part of Apache Hadoop 1.x and is an implementation of the MapReduce programming paradigm.

The MapReduce project itself can be broken into the following parts:

  • End-user MapReduce API: This is the API needed to develop the MapReduce application.

  • MapReduce framework: This is the runtime implementation of various phases, such as the map phase, the sort/shuffle/merge aggregation phase, and the reduce phase.

  • MapReduce system: This is the backend infrastructure required to run MapReduce applications and includes things such as cluster resource management, scheduling of jobs, and so on.

Hadoop 1.x was written solely as an MR engine. Since it runs on a cluster, its cluster management component was also tightly coupled with the MR programming paradigm. The only thing that could be run on Hadoop 1.x was an MR job.

In MRv1, the cluster was managed by a single JobTracker and multiple TaskTrackers running on the DataNodes.

In Hadoop 2.x, the old MRv1 framework...