Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Broadcasting and distributing shared resources to tasks in a MapReduce job – Hadoop DistributedCache


We can use the Hadoop DistributedCache to distribute read-only file-based resources to the Map and Reduce tasks. These resources can be simple data files, archives, or JAR files that are needed for the computations performed by the Mappers or the Reducers.

How to do it...

The following steps show you how to add a file to the Hadoop DistributedCache and how to retrieve it from the Map and Reduce tasks:

  1. Copy the resource to the HDFS. You can also use files that are already there in the HDFS.

    $ hadoop fs –copyFromLocal ip2loc.dat ip2loc.dat
    
  2. Add the resource to the DistributedCache from your driver program:

    Job job = Job.getInstance……
    ……
    job.addCacheFile(new URI("ip2loc.dat#ip2location"));
  3. Retrieve the resource in the setup() method of your Mapper or Reducer and use the data in the map() or reduce() function:

    public class LogProcessorMap extends Mapper<Object, LogWritable, Text, IntWritable>...