We will see how to write a simple MapReduce job for word count and schedule it via Oozie. Later, we will wrap this in our first Coordinator job. Along this journey, we will learn some concepts and apply them in examples.
I have already saved one word count Java MapReduce code, which we will try to run over our input data. Let's dive into the code. You can check out the mapreduce
folder in Book_Code_Folder/learn_oozie/ch04/
.
Note
Check the workflow_0.5.xsd
file in the xsd_svg
folder and note the inputs needed for the MapReduce action to run.
The Workflow is shown in the following code and we can see the arguments are the same as the one we need in the Hadoop jar
command for running a MapReduce job. At the start of the job, we delete the output
folder as Hadoop fails the job if the output
folder already exists.
The mapper that we need is life.jugnu.learnoozie.ch04.WordCountMapper
and the reducer is life.jugnu.learnoozie.ch04.WordCountReducer
. Both of them are present...