Hadoop MapReduce works by submitting a fat jar (a JAR file that contains all the dependencies and the application code) to the JobTracker. We can generate this jar using the sources accompanying this book with the following command:
$ mvn clean package
We can test that file by executing it in the Cascading local mode (without using anything from Hadoop) with the following command:
$ java -cp target/chapter2-0-jar-with-dependencies.jar com.twitter.scalding.Tool WordCountJob --local --input input.txt ––output output.txt -Xmx1024m
Then, we can start leveraging Hadoop using the command hadoop jar
to execute the job:
$ hadoop jar target/chapter2-0-jar-with-dependencies.jar com.twitter.scalding.Tool WordCountJob --local --input input.txt ––output output.txt
Now, we are ready to submit this job into a Hadoop cluster and use the Hadoop Distributed File System. First, we have to create an HDFS folder and push the input data with the help of the following commands...