Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Running Pig


Pig is a tool that translates statements written in Pig Latin and executes them either on a single machine in standalone mode or on a full Hadoop cluster when in distributed mode. Even in the latter, Pig's role is to translate Pig Latin statements into MapReduce jobs and therefore it doesn't require the installation of additional services or daemons. It is used as a command-line tool with its associated libraries.

Cloudera CDH ships with Apache Pig version 0.12. Alternatively, the Pig source code and binary distributions can be obtained at https://pig.apache.org/releases.html.

As can be expected, the MapReduce mode requires access to a Hadoop cluster and HDFS installation. MapReduce mode is the default mode executed when running the Pig command at the command-line prompt. Scripts can be executed with the following command:

$ pig -f <script>

Parameters can be passed via the command line using -param <param>=<val>, as follows:

$ pig –param input=tweets.txt

Parameters...