In this section we will learn how to run Hadoop Streaming jobs using Oozie. Hadoop Streaming gives the functionality to use different languages such as Python, C++, and Ruby to write MapReduce code.
Note
Read the Oozie documentation at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2_Map-Reduce_Action and write a Workflow to run a Streaming job. Schedule the same using Coordinator. You can refer to the sample Python mapper and reducer code available at http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/.
Save the Python code from the preceding web links as mapper.py
and reducer.py
in the streaming
folder.
The <mapper>
tag makes our mapper and reducer file available to Oozie.
The Workflow looks like this:
<workflow-app name="Mapreduce_Streaming_example" xmlns="uri:oozie:workflow:0.5"> <start to="streaming-c097"/> <kill name="Kill"> <message>Action failed, error message...