Elasticsearch for Hadoop

In the previous chapter, we downloaded and ran our first Hadoop MapReduce job that used the ES-Hadoop library. Let's get inside the WordCount job to understand how it is developed.

Understanding Mapper

Here is how WordsMapper.java looks:

package com.packtpub.esh;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.StringTokenizer;

public class WordsMapper extends Mapper<Object, Text, Text, IntWritable> {

  private final static IntWritable one = new IntWritable(1);
  
  public void map(Object key, Text value, Context context)  throws IOException, InterruptedException {
    StringTokenizer itr = new StringTokenizer(value.toString());

    while (itr.hasMoreTokens()) {
            Text word = new Text();
      word.set(itr.nextToken());
      context.write(word, one);
    }
  }
}

To all the MapReduce developers, this Mapper is very well known...

Elasticsearch for Hadoop

By : Vishal Shukla

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Related Content you might be interested in

Current Title:

Elasticsearch for Hadoop

Understanding the WordCount program

Understanding Mapper