Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Summary


YARN has opened up the Hadoop ecosystem to a wide range of applications. It has not only alleviated scaling bottlenecks that were present in traditional MapReduce-based Hadoop but also aided in improving infrastructure efficiency in an organization. This was made possible by:

  • Separating out application-specific logic from resource management. The ResourceManager is solely responsible for cluster resource management and is agnostic of any application.

  • Providing common and generic abstractions for resource specifications. Resources are specified in terms of cores and memory.

  • Maintaining backward compatibility with existing Hadoop APIs. Existing Hadoop programs work on YARN on recompilation, without any code changes.

  • Providing a variety of pluggable scheduling policies such as FairScheduler and CapacityScheduler. Pluggable policies make it easy for other paradigms to come onboard.

Development of newer computing paradigms on Hadoop is as simple as implementing a client and Application Master...