Book Image

YARN Essentials

Book Image

YARN Essentials

Overview of this book

Table of Contents (17 chapters)
YARN Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Need for YARN
9
YARN – Alternative Solutions
Index

A short introduction to Hadoop 1.x and MRv1


We will briefly look at the basic Apache Hadoop 1.x and its processing framework, MRv1 (Classic), so that we can get a clear picture of the differences in Apache Hadoop 2.x MRv2 (YARN) in terms of architecture, components, and processing framework.

Apache Hadoop is a scalable, fault-tolerant distributed system for data storage and processing. The core programming model in Hadoop is MapReduce.

Since 2004, Hadoop has emerged as the de facto standard to store, process, and analyze hundreds of terabytes and even petabytes of data.

The major components in Hadoop 1.x are as follows:

  • NameNode: This keeps the metadata in the main memory.

  • DataNode: This is where the data resides in the form of blocks.

  • JobTracker: This assigns/reassigns MapReduce tasks to TaskTrackers in the cluster and tracks the status of each TaskTracker.

  • TaskTracker: This executes the task assigned by the JobTracker and sends the status of the task to the JobTracker.

The major components...