We will start this chapter with a case study example. In the previous chapter, we created our first Oozie Workflow to delete a given directory; we will build on top of that.
In this chapter, our use case is as follows.
On a daily basis we get incoming data in a HDFS directory. Our Workflow comes into action to process it via a simple Pig script. If we find the directory empty, we send a mail to the support team stating we did not get any data today. This is a very common data ingestion pattern in Hadoop for file-based loads.
There are many concepts, which will be introduced by use of this example; I thought to do it this way rather than sharing the concept first and sharing the example later. Using this example, we will cover the following concepts:
Decision nodes
Expression language
Oozie command-line execution
Let's get started. The data ingestion pipeline for our use case can be represented as follows:
Open Hue and go to Editor | Workflows to create...