-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Hadoop MapReduce v2 Cookbook - Second Edition: RAW - Second Edition
This recipe describes how to run a MapReduce computation in a distributed Hadoop v2 cluster.
Start the Hadoop cluster by following the Setting up HDFS recipe or the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe.
Now let's run the WordCount sample in the distributed Hadoop v2 setup:
wc-input directory in the source repository to the HDFS filesystem. Alternatively, you can upload any other set of text documents as well.$ hdfs dfs -copyFromLocal wc-input .
HADOOP_HOME directory:$ hadoop jar hcb-c1-samples.jar \ chapter1.WordCount \ wc-input wc-output
$hdfs dfs -ls wc-output Found 3 items -rw-r--r-- 1 joesupergroup0 2013-11-09 09:04 /data/output1/_SUCCESS drwxr-xr-x - joesupergroup0 2013-11-09 09:04 /data/output1/_logs -rw-r--r-- 1 joesupergroup1306 2013-11-09 09:04 /data/output1/part-r-00000 $ hdfs dfs -cat wc-output/part*
When we submit a job, YARN would schedule a MapReduce ApplicationMaster to coordinate and execute the computation. ApplicationMaster requests the necessary resources from the ResourceManager and executes the MapReduce computation using the containers it received from the resource request.
You can also see the results of the WordCount application through the HDFS monitoring UI by visiting http://NAMANODE:50070.
Change the font size
Change margin width
Change background colour