Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
About the Author
About the Reviewers

Development and debugging aids

There are three important commands that can help develop, debug, and optimize Pig scripts.

The DESCRIBE command

The DESCRIBE command gives the schema of a relation. This command is useful when you are a Pig Latin beginner and want to understand how operators transform the data. The output corresponding to the groupByCountry relation in the previous script code to find the population of the country is given as follows:

groupByCountry: {group: chararray,generateRecords: {(cc::cname: chararray,ccity::cityName: chararray,ccity::population: long)}} 

The DESCRIBE output has the Pig syntax. In the preceding example, groupByCountry is a Bag data type that contains a group element and another bag, generateRecords.

The EXPLAIN command

EXPLAIN, on a relation, shows how the Pig script will be executed. It is useful when trying to optimize Pig scripts or debug errors. It shows the logical, physical, and MapReduce plans of the relation. The following screenshot shows the MapReduce...