Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Executing a Pig script using EMR


Amazon EMR supports executing Apache Pig scripts on the data stored in S3. Refer to the Pig-related recipes in Chapter 7, Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop, for more details on using Apache Pig for data analysis.

In this recipe, we are going to execute a simple Pig script using Amazon EMR. This sample will use the Human Development Reports data (http://hdr.undp.org/en/statistics/data/) to print names of countries that have a GNI value greater than $2000 of gross national income per capita (GNI) sorted by GNI.

How to do it...

The following steps show you how to use a Pig script with Amazon Elastic MapReduce to process a dataset stored on Amazon S3:

  1. Use the Amazon S3 console to create a bucket in S3 to upload the input data. Upload the resources/hdi-data.csv file in the source repository for this chapter to the newly created bucket. You can also use an existing bucket or a directory inside a bucket as well. We assume the S3 path for the uploaded...