This recipe explains how to run a MapReduce job that reads and writes data directly to and from HBase storage.
HBase provides abstract mapper and reducer implementations that users can extend to read and write directly from HBase. This recipe explains how to write a sample MapReduce application using these mappers and reducers.
We will use the World Bank's Human Development Report (HDR) data, by country, which shows Gross National Income (GNI) per capita of each country. The dataset can be found at http://hdr.undp.org/en/statistics/data/. A sample of this dataset is available in the chapter7/resources/hdi-data.csv
file in the sample source code repository. Using MapReduce, we will calculate average value of GNI per capita, by country.
This recipe requires an Apache HBase installation integrated with a Hadoop YARN cluster. Make sure to start all the configured HBase Master and RegionServer processes before we begin.