In the previous chapter, we downloaded and ran our first Hadoop MapReduce job that used the ES-Hadoop library. Let's get inside the WordCount job to understand how it is developed.
Here is how WordsMapper.java
looks:
package com.packtpub.esh; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; import java.util.StringTokenizer; public class WordsMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { Text word = new Text(); word.set(itr.nextToken()); context.write(word, one); } } }
To all the MapReduce developers, this Mapper
is very well known...