This recipe runs a basic Pig script. As the sample dataset, we will use Human Development Report (HDR) data by country. It shows the Gross National Income (GNI) per capita by country. The dataset can be found from http://hdr.undp.org/en/statistics/data/. This recipe will use Pig to process the dataset and create a list of countries that have more than 2000$ of gross national income per capita (GNI) sorted by the GNI value.
This section describes how to use Pig Latin script to find countries with 2000$ GNI sorted by the same criterion from the HDR dataset.
From the sample code, copy the dataset from
resources/chapter5/hdi-data.csv
toPIG_HOME/bin
directory.From the sample code, copy the Pig script
resources/chapter5/countryFilter.pig
toPIG_HOME/bin
.Open the Pig script through your favorite editor. It will look like the following:
A = load 'hdi-data.csv' using PigStorage(',') AS (id:int, country:chararray, hdi:float, lifeex:int, mysch:int, eysch...