7. Running Scalding in Production | Programming MapReduce with Scalding

Executing Scalding in a Hadoop cluster

Deploying an application requires using a build tool to package our application into a jar file and copying it to a client node of the Hadoop cluster. The process of execution is straightforward and is very similar to submitting any JAR file for execution on a Hadoop cluster, as shown in the following command:

$ hadoop jar myjar.jar com.twitter.scalding.Tool mypackage.MyJob–-hdfs –-input /data/set1/ --output /output/res1/

The submitted job has the same permissions in HDFS as the user that submitted the job. If the read and write permissions are satisfied, it will process the input and store the resulting data.

Note

Scalding applications, when storing in HDFS, write data to the output folder defined in a sink in our job. Any existing content on that folder is purged every time a job begins its execution.

Internally, the JAR file is submitted to the JobTracker service that orchestrates the execution of the map and reduce phases. The actual tasks are executed...

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Executing Scalding in a Hadoop cluster

Note

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Executing Scalding in a Hadoop cluster

Note

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access