Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
About the Author
About the Reviewers

Best practices

The optimization rules in the previous section change the logical plan of a Pig script to enhance performance. We know that these rules will help develop efficient scripts. There are a few other practices that can speed up Pig scripts. These best practices cannot be made into rules as they are application and data specific. Also, the optimization rules tend to be conservative and might not guarantee the application of the rule.

The explicit usage of types

Pig supports many types, both primitive and complex. Type usages can speed up your scripts, sometimes up to 2X. For example, in Pig, all numerical computations without type specifications are considered as double computations. The double type in Pig takes up 8 bytes of storage, while an int type takes up 4 bytes. The computation using int is faster than the computation involving the double type.

Early and frequent projection

As we saw with the AddForEach and ColumnMapKeyPrune optimizers, it is a good practice to project only...