Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Hive query optimizers


After type checking and semantic analysis of the query, a number of rule-based transformations are applied to optimize the query. We will discuss some of these optimizations here. Custom optimizations can be written by implementing the org.apache.hadoop.hive.ql.optimizer.Transform interface. This interface has one method that takes in a ParseContext object and returns another after the transformation. The ParseContext object has the current operator tree, among other information.

The following are the few optimizations that are already available with Hive 0.13.0:

  • ColumnPruner: This operator tree is walked to determine the minimal number of columns in the base table that are required to fulfill the query. Any additional columns in the base table are pruned away by inserting a SELECT statement when reading the base tables. This reduces the amount of data read, processed, and written.

  • GlobalLimitOptimizer: When a LIMIT operator is used in a query, this particular optimizer...