To demonstrate how MapReduce works, we illustrate the example of a word count, which counts the number of occurrences of each word in a given input set. In this recipe, we will demonstrate how to use rmr2
to implement a word count problem.
In this recipe, we will need an input file as our word count program input. You can download the example input from https://github.com/ywchiu/ml_R_cookbook/tree/master/CH12.
Perform the following steps to implement the word count program:
- First, you need to configure the system environment, and then load
rmr2
andrhdfs
into an R session. You may need to update the use of the JAR file if you use a different version of QuickStart VM:
> Sys.setenv(HADOOP_CMD="/usr/bin/hadoop") > Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop -streaming-2.5.0-cdh5.2.0.jar ") > library(rmr2) > library(rhdfs) > hdfs.init()
- You can...