Let's run an example job that counts the frequency of each word across files:
We need to input some text to run the example. We will download Alice's Adventures in Wonderland from Project Gutenberg's site:
ubuntu@master:~ $ wget http://www.gutenberg.org/files/11/11.txt –O /tmp/alice.txt
Create an
input
directory on HDFS and put the downloaded file into it:ubuntu@master:~ $ bin/hadoop dfs –mkdir input ubuntu@master:~ $ bin/hadoop dfs –put /temp/alice.txt input
Run the Hadoop WordCount example from the
hadoop-examples jar
file, which is part of the Hadoop distribution:ubuntu@master:~ $ bin/hadoop jar hadoop-examples.jar wordcount input output
The output of the program will be in the
output
directory of HDFS. We can see the output using the following command:ubuntu@master:~ $ bin/hadoop dfs –cat output/*
The output will have words with the corresponding frequencies on each line.