Programming MapReduce with Scalding

Hadoop MapReduce works by submitting a fat jar (a JAR file that contains all the dependencies and the application code) to the JobTracker. We can generate this jar using the sources accompanying this book with the following command:

$ mvn clean package

We can test that file by executing it in the Cascading local mode (without using anything from Hadoop) with the following command:

$ java -cp target/chapter2-0-jar-with-dependencies.jar com.twitter.scalding.Tool WordCountJob --local --input input.txt ––output output.txt -Xmx1024m

Then, we can start leveraging Hadoop using the command hadoop jar to execute the job:

$ hadoop jar target/chapter2-0-jar-with-dependencies.jar com.twitter.scalding.Tool WordCountJob --local --input input.txt ––output output.txt

Now, we are ready to submit this job into a Hadoop cluster and use the Hadoop Distributed File System. First, we have to create an HDFS folder and push the input data with the help of the following commands...

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Related Content you might be interested in

Current Title:

Programming MapReduce with Scalding

Submitting a Scalding job in Hadoop