Hive provides a SQL-like query layer for the data stored in HDFS utilizing Hadoop MapReduce underneath. Amazon EMR supports executing Hive queries on the data stored in S3. Refer to the Apache Hive recipes in Chapter 6, Hadoop Ecosystem – Apache Hive, for more information on using Hive for large-scale data analysis.
In this recipe, we are going to execute a Hive script to perform the computation we did in the Executing a Pig script using EMR recipe earlier. We will use the Human Development Reports data (http://hdr.undp.org/en/statistics/data/) to print names of countries that have a GNI value greater than $2000 of gross national income per capita (GNI) sorted by GNI.