Book Image

Hadoop MapReduce Cookbook

By : Srinath Perera, Thilina Gunarathne
Book Image

Hadoop MapReduce Cookbook

By: Srinath Perera, Thilina Gunarathne

Overview of this book

<p>We are facing an avalanche of data. The unstructured data we gather can contain many insights that might hold the key to business success or failure. Harnessing the ability to analyze and process this data with Hadoop MapReduce is one of the most highly sought after skills in today's job market.<br /><br />"Hadoop MapReduce Cookbook" is a one-stop guide to processing large and complex data sets using the Hadoop ecosystem. The book introduces you to simple examples and then dives deep to solve in-depth big data use cases.</p> <p>"Hadoop MapReduce Cookbook" presents more than 50 ready-to-use Hadoop MapReduce recipes in a simple and straightforward manner, with step-by-step instructions and real world examples.<br /><br />Start with how to install, then configure, extend, and administer Hadoop. Then write simple examples, learn MapReduce patterns, harness the Hadoop landscape, and finally jump to the cloud.<br /><br />The book deals with many exciting topics such as setting up Hadoop security, using MapReduce to solve analytics, classifications, on-line marketing, recommendations, and searching use cases. You will learn how to harness components from the Hadoop ecosystem including HBase, Hadoop, Pig, and Mahout, then learn how to set up cloud environments to perform Hadoop MapReduce computations.<br /><br />"Hadoop MapReduce Cookbook" teaches you how process large and complex data sets using real examples providing a comprehensive guide to get things done using Hadoop MapReduce.</p>
Table of Contents (17 chapters)
Hadoop MapReduce Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Running your first Pig command


This recipe runs a basic Pig script. As the sample dataset, we will use Human Development Report (HDR) data by country. It shows the Gross National Income (GNI) per capita by country. The dataset can be found from http://hdr.undp.org/en/statistics/data/. This recipe will use Pig to process the dataset and create a list of countries that have more than 2000$ of gross national income per capita (GNI) sorted by the GNI value.

How to do it...

This section describes how to use Pig Latin script to find countries with 2000$ GNI sorted by the same criterion from the HDR dataset.

  1. From the sample code, copy the dataset from resources/chapter5/hdi-data.csv to PIG_HOME/bin directory.

  2. From the sample code, copy the Pig script resources/chapter5/countryFilter.pig to PIG_HOME/bin.

  3. Open the Pig script through your favorite editor. It will look like the following:

    A = load 'hdi-data.csv' using PigStorage(',')  AS (id:int, country:chararray, hdi:float, lifeex:int, mysch:int, eysch...