To understand Hadoop MapReduce fundamentals properly, we will:
Understand MapReduce objects
Learn how to decide the number of Maps in MapReduce
Learn how to decide the number of Reduces in MapReduce
Understand MapReduce dataflow
Take a closer look at Hadoop MapReduce terminologies
As we know, MapReduce operations in Hadoop are carried out mainly by three objects: Mapper, Reducer, and Driver.
Mapper: This is designed for the Map phase of MapReduce, which starts MapReduce operations by carrying input files and splitting them into several pieces. For each piece, it will emit a key-value data pair as the output value.
Reducer: This is designed for the Reduce phase of a MapReduce job; it accepts key-based grouped data from the Mapper output, reduces it by aggregation logic, and emits the
(key, value)
pair for the group of values.Driver: This is the main file that drives the MapReduce process. It starts the execution of MapReduce...