Let's see the Pig script that will help us calculate the maximum rainfall in each month.
I have saved the input data for this chapter in the input
folder placed at BOOK_CODE_HOME/learn_oozie/ch05
.
If you have already copied the source code for this folder on HDFS at the start of chapter, then it will automatically go to the right place inside HDFS. If not, you can copy the code to HDFS now.
The input data is comma separated and the columns in the data are as follows:
Product code
Bureau of Meteorology station number
Year, Month, Day
Rainfall amount (millimeter's)
Period over which rainfall was measured (days)
Quality
We will write the Pig script and load the raw input data, which is grouped by year and month. Then, we will calculate maximum rainfall for each month.
The following Pig script is present at the path BOOK_CODE_HOME/learn_oozie/ch05/rainfall/pig
:
# Pig Script to find Max rain in given month A = load '${pig_input}' using PigStorage(',') as (product_code:chararray,station_number...