Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Walking through a run of a MapReduce job


To explore the relationship between mapper and reducer in more detail, and to expose some of Hadoop's inner workings, we'll now go through how a MapReduce job is executed. This applies to both MapReduce in Hadoop 1 and Hadoop 2 even though the latter is implemented very differently using YARN, which we'll discuss later in this chapter. Additional information on the services described in this section, as well as suggestions for troubleshooting MapReduce applications, can be found in Chapter 10, Running a Hadoop Cluster.

Startup

The driver is the only piece of code that runs on our local machine, and the call to Job.waitForCompletion() starts the communication with the JobTracker, which is the master node in the MapReduce system. The JobTracker is responsible for all aspects of job scheduling and execution, so it becomes our primary interface when performing any task related to job management.

To share resources on the cluster the JobTracker can use one...