Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Hadoop MapReduce v2 Cookbook - Second Edition: RAW
  • Table Of Contents Toc
Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Hadoop MapReduce v2 Cookbook - Second Edition: RAW - Second Edition

4.4 (7)
close
close
Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

4.4 (7)

Overview of this book

If you are a Big Data enthusiast and wish to use Hadoop v2 to solve your problems, then this book is for you. This book is for Java programmers with little to moderate knowledge of Hadoop MapReduce. This is also a one-stop reference for developers and system admins who want to quickly get up to speed with using Hadoop v2. It would be helpful to have a basic knowledge of software development using Java and a basic working knowledge of Linux.
Table of Contents (12 chapters)
close
close
11
Index

Adding a combiner step to the WordCount MapReduce program

A single Map task may output many key-value pairs with the same key causing Hadoop to shuffle (move) all those values over the network to the Reduce tasks, incurring a significant overhead. For example, in the previous WordCount MapReduce program, when a Mapper encounters multiple occurrences of the same word in a single Map task, the map function would output many <word,1> intermediate key-value pairs to be transmitted over the network. However, we can optimize this scenario if we can sum all the instances of <word,1> pairs to a single <word, count> pair before sending the data across the network to the Reducers.

To optimize such scenarios, Hadoop supports a special function called combiner, which performs local aggregation of the Map task output key-value pairs. When provided, Hadoop calls the combiner function on the Map task outputs before persisting the data on the disk to shuffle the Reduce tasks. This can significantly reduce the amount of data shuffled from the Map tasks to the Reduce tasks. It should be noted that the combiner is an optional step of the MapReduce flow. Even when you provide a combiner implementation, Hadoop may decide to invoke it only for a subset of the Map output data or may decide to not invoke it at all.

This recipe explains how to use a combiner with the WordCount MapReduce application introduced in the previous recipe.

How to do it...

Now let's add a combiner to the WordCount MapReduce application:

  1. The combiner must have the same interface as the reduce function. Output key-value pair types emitted by the combiner should match the type of the Reducer input key-value pairs. For the WordCount sample, we can reuse the WordCount reduce function as the combiner since the input and output data types of the WordCount reduce function are the same.
  2. Uncomment the following line in the WordCount.java file to enable the combiner for the WordCount application:
    job.setCombinerClass(IntSumReducer.class);
  3. Recompile the code by re-running the Gradle (gradle build) or the Ant build (ant compile).
  4. Run the WordCount sample using the following command. Make sure to delete the old output directory (wc-output) before running the job.
    $ $HADOOP_HOME/bin/hadoop jar \
    hcb-c1-samples.jar \
    chapter1.WordCount wc-input wc-output
    
  5. The final results will be available from the wc-output directory.

How it works...

When provided, Hadoop calls the combiner function on the Map task outputs before persisting the data on the disk for shuffling to the Reduce tasks. The combiner can pre-process the data generated by the Mapper before sending it to the Reducer, thus reducing the amount of data that needs to be transferred.

In the WordCount application, combiner receives N number of (word,1) pairs as input and outputs a single (word, N) pair. For example, if an input processed by a Map task had 1,000 occurrences of the word "the", the Mapper will generate 1,000 (the,1) pairs, while the combiner will generate one (the,1000) pair, thus reducing the amount of data that needs to be transferred to the Reduce tasks. The following diagram show the usage of the combiner in the WordCount MapReduce application:

How it works...

There's more...

Using the job's reduce function as the combiner only works when the reduce function input and output key-value data types are the same. In situations where you cannot reuse the reduce function as the combiner, you can write a dedicated reduce function implementation to act as the combiner. Combiner input and output key-value pair types should be the same as the Mapper output key-value pair types.

We reiterate that the combiner is an optional step of the MapReduce flow. Even when you provide a combiner implementation, Hadoop may decide to invoke it only for a subset of the Map output data or may decide to not invoke it at all. Care should be taken not to use the combiner to perform any essential tasks of the computation as Hadoop does not guarantee the execution of the combiner.

Using a combiner does not yield significant gains in the non-distributed modes. However, in the distributed setups as described in Setting up Hadoop YARN in a distributed cluster environment using Hadoop v2 recipe, a combiner can provide significant performance gains.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Hadoop MapReduce v2 Cookbook - Second Edition: RAW
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon