In this recipe, we are going to take a look at how to execute parallel jobs using the Oozie fork node. Here, we will be executing one Hive and one Pig job in parallel.
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie, Hive, and Pig installed on it.
For parallel execution, we need to use the fork node given by Oozie. The following is a sample workflow that executes Hive and Pig jobs in parallel:
<workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf"> <start to="fork-node"/> <fork name="fork-node"> <path start="pig-node"/> <path start="hive-node"/> </fork> <action name="pig-node"> <pig> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/> </prepare> <configuration...