-
Book Overview & Buying
-
Table Of Contents
Hadoop MapReduce v2 Cookbook - Second Edition: RAW - Second Edition
Hive provides a SQL-like query layer for the data stored in HDFS utilizing Hadoop MapReduce underneath. Amazon EMR supports executing Hive queries on the data stored in S3. Refer to the Apache Hive recipes in Chapter 6, Hadoop Ecosystem – Apache Hive, for more information on using Hive for large-scale data analysis.
In this recipe, we are going to execute a Hive script to perform the computation we did in the Executing a Pig script using EMR recipe earlier. We will use the Human Development Reports data (http://hdr.undp.org/en/statistics/data/) to print names of countries that have a GNI value greater than $2000 of gross national income per capita (GNI) sorted by GNI.
The following steps show how to use a Hive script with Amazon Elastic MapReduce to query a dataset stored on Amazon S3:
resources/hdi-data.csv file...
Change the font size
Change margin width
Change background colour