Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 3. Advanced Pig

Running Java MapReduce jobs on Hadoop provides the most flexibility with the least abstraction. However, abstractions are necessary to infer patterns, accomplish common data manipulation tasks, reduce complexity, and flatten the learning curve. Pig is a platform that provides a framework and high-level abstractions to build MapReduce programs for Hadoop. It has a scripting language called Pig Latin. Pig Latin can be compared to SQL in terms of operator capabilities.

Developed at Yahoo! around the year 2006, Pig was used as a framework to specify ad hoc MapReduce workflows. In the following year, it was moved to Apache Software Foundation. The latest release of Pig is 0.12.1.

Tip

The official release of Pig is currently incompatible with Hadoop 2.2.0. It expects libraries from Hadoop 1.2.1. Running any Pig script fails, with the following exception:

Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext...