Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction


This chapter introduces you to several advanced Hadoop MapReduce features that will help you to develop highly customized, efficient MapReduce applications.

The preceding figure depicts the typical flow of a Hadoop MapReduce computation. The InputFormat reads the input data from HDFS and parses the data to create key-value pair inputs for the map function. InputFormat also performs the logical partitioning of data to create the Map tasks of the computation. A typical MapReduce computation creates a Map task for each input HDFS data block. Hadoop invokes the user provided map function for each of the generated key-value pairs. As mentioned in Chapter 1, Getting Started with Hadoop v2, if provided, the optional combiner step may get invoked with the output data from the map function.

The Partitioner step then partitions the output data of the Map task in order to send them to the respective Reduce tasks. This partitioning is performed using the key field of the Map task output key...