Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Pig performance optimizations


In this section, we will look at different performance parameters and how to tune them for optimized Pig script execution.

The optimization rules

Pig applies optimization rules on the generated logical plan for a Pig script. By default, all rules are enabled. The pig.optimizer.rules.disabled property can be used to disable rules. The –optimizer_off command-line option can also be used when executing a Pig script to disable rules. Some rules are mandatory and cannot be disabled. The all option disables all the non-mandatory rules:

set pig.optimizer.rules.disabled <comma-separated rules list>

Alternatively, you can use the following command:

pig –t|–optimizer_off [rule name | all]

Tip

FilterLogicExpressionSimplifier is turned off by default. Setting the property pig.exec.filterLogicExpressionSimplifier to true can turn it on.

Most of the optimization rules discussed in the following section are simple and borrowed from database query optimizations:

  • PartitionFilterOptimizer...