Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Developing YARN applications


YARN can bring in other computing paradigms to Hadoop. In Hadoop 2.X, MapReduce, Pig, and Hive are all Application Master libraries and their corresponding clients. Developers can write their own applications using the YARN API and leverage the existing infrastructure running Hadoop. Also, enterprises can have lots of data assets in HDFS already, and writing custom applications can leverage this without a need to provision new clusters or migrate the existing data.

Storm is a real-time stream-processing engine that has been ported onto YARN, bringing in the paradigm of moving data to compute nodes. Spark is another project that is on YARN and can leverage the existing Hadoop infrastructure to provide in-memory data transformations, including MapReduce. There are a number of projects in development that exhibit Hadoop's capability as a generic cluster-computing platform.

In this section, let's look at how to write a simple YARN application. The application takes...