This chapter introduces you to several advanced Hadoop MapReduce features that will help you to develop highly customized, efficient MapReduce applications.
The preceding figure depicts the typical flow of a Hadoop MapReduce computation. The InputFormat reads the input data from HDFS and parses the data to create key-value pair inputs for the map
function. InputFormat also performs the logical partitioning of data to create the Map tasks of the computation. A typical MapReduce computation creates a Map task for each input HDFS data block. Hadoop invokes the user provided map
function for each of the generated key-value pairs. As mentioned in Chapter 1, Getting Started with Hadoop v2, if provided, the optional combiner step may get invoked with the output data from the map
function.
The Partitioner step then partitions the output data of the Map task in order to send them to the respective Reduce tasks. This partitioning is performed using the key field of the Map task output key...