Book Image

Elasticsearch for Hadoop

By : Vishal Shukla
Book Image

Elasticsearch for Hadoop

By: Vishal Shukla

Overview of this book

Table of Contents (15 chapters)
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Understanding the WordCount program


In the previous chapter, we downloaded and ran our first Hadoop MapReduce job that used the ES-Hadoop library. Let's get inside the WordCount job to understand how it is developed.

Understanding Mapper

Here is how WordsMapper.java looks:

package com.packtpub.esh;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.StringTokenizer;

public class WordsMapper extends Mapper<Object, Text, Text, IntWritable> {

  private final static IntWritable one = new IntWritable(1);
  
  public void map(Object key, Text value, Context context)  throws IOException, InterruptedException {
    StringTokenizer itr = new StringTokenizer(value.toString());

    while (itr.hasMoreTokens()) {
            Text word = new Text();
      word.set(itr.nextToken());
      context.write(word, one);
    }
  }
}

To all the MapReduce developers, this Mapper is very well known...