The optimization rules in the previous section change the logical plan of a Pig script to enhance performance. We know that these rules will help develop efficient scripts. There are a few other practices that can speed up Pig scripts. These best practices cannot be made into rules as they are application and data specific. Also, the optimization rules tend to be conservative and might not guarantee the application of the rule.
Pig supports many types, both primitive and complex. Type usages can speed up your scripts, sometimes up to 2X. For example, in Pig, all numerical computations without type specifications are considered as double computations. The double
type in Pig takes up 8 bytes of storage, while an int
type takes up 4 bytes. The computation using int
is faster than the computation involving the double
type.