Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Submitting a Scalding job in Hadoop


Hadoop MapReduce works by submitting a fat jar (a JAR file that contains all the dependencies and the application code) to the JobTracker. We can generate this jar using the sources accompanying this book with the following command:

$ mvn clean package

We can test that file by executing it in the Cascading local mode (without using anything from Hadoop) with the following command:

$ java -cp target/chapter2-0-jar-with-dependencies.jar com.twitter.scalding.Tool WordCountJob --local --input input.txt ––output output.txt -Xmx1024m

Then, we can start leveraging Hadoop using the command hadoop jar to execute the job:

$ hadoop jar target/chapter2-0-jar-with-dependencies.jar com.twitter.scalding.Tool WordCountJob --local --input input.txt ––output output.txt

Now, we are ready to submit this job into a Hadoop cluster and use the Hadoop Distributed File System. First, we have to create an HDFS folder and push the input data with the help of the following commands...