We can use the Hadoop DistributedCache to distribute read-only file-based resources to the Map and Reduce tasks. These resources can be simple data files, archives, or JAR files that are needed for the computations performed by the Mappers or the Reducers.
The following steps show you how to add a file to the Hadoop DistributedCache and how to retrieve it from the Map and Reduce tasks:
Copy the resource to the HDFS. You can also use files that are already there in the HDFS.
$ hadoop fs –copyFromLocal ip2loc.dat ip2loc.dat
Add the resource to the DistributedCache from your driver program:
Job job = Job.getInstance…… …… job.addCacheFile(new URI("ip2loc.dat#ip2location"));
Retrieve the resource in the
setup()
method of your Mapper or Reducer and use the data in themap()
orreduce()
function:public class LogProcessorMap extends Mapper<Object, LogWritable, Text, IntWritable>...