Book Image

YARN Essentials

By : Fasale, Nirmal Kumar
Book Image

YARN Essentials

By: Fasale, Nirmal Kumar

Overview of this book

If you have a working knowledge of Hadoop 1.x but want to start afresh with YARN, this book is ideal for you. You will be able to install and administer a YARN cluster and also discover the configuration settings to fine-tune your cluster both in terms of performance and scalability. This book will help you develop, deploy, and run multiple applications/frameworks on the same shared YARN cluster.
Table of Contents (12 chapters)
Free Chapter
1
1. Need for YARN
9
9. YARN – Alternative Solutions
11
Index

NodeManager failures


Almost all nodes in the cluster runs a NodeManager service daemon. The NodeManager takes care of executing a certain part of a YARN job on every individual machine, while other parts are executed on other nodes. For a 1000 node YARN cluster, there are probably around 999 node managers running. So node managers are indeed a per-node agent and takes care of the individual nodes distributed in the cluster.

If a Node Manager fails, the ResourceManager detects this failure using a time-out (that is, stops receiving the heartbeats from the NodeManager). The ResourceManager then removes the NodeManager from its pool of available NodeManagers. It also kills all the containers running on that node & reports the failure to all running AMs. AMs are then responsible for reacting to node failures, by redoing the work done by any containers running on that node during the fault.

If the fault causing the time-out is transient then the Node Manager will resynchronizes with the ResourceManager...