Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
About the Authors
About the Reviewers

An overview of Pig

Historically, the Pig toolkit consisted of a compiler that generated MapReduce programs, bundled their dependencies, and executed them on Hadoop. Pig jobs are written in a language called Pig Lat in and can be executed in both interactive and batch fashions. Furthermore, Pig Latin can be extended using User Defined Functions (UDFs) written in Java, Python, Ruby, Groovy, or JavaScript.

Pig use cases include the following:

  • Data processing

  • Ad hoc analytical queries

  • Rapid prototyping of algorithms

  • Extract Transform Load pipelines

Following a trend we have seen in previous chapters, Pig is moving towards a general-purpose computing architecture. As of version 0.13 the ExecutionEngine interface (org.apache.pig.backend.executionengine) acts as a bridge between the frontend and the backend of Pig, allowing Pig Latin scripts to be compiled and executed on frameworks other than MapReduce. At the time of writing, version 0.13 ships with MRExecutionEngine (org.apache.pig.backend.hadoop...