In this chapter, we went through the advanced features of Pig. We looked into the optimizations that Pig has to offer. The following are a few key takeaways from this chapter:
As a rule, try to use Pig in as many situations as you can. Pig's abstractions, development aids, and flexibility can save you both time and money. Stretch Pig's capabilities before reverting to MapReduce jobs.
The logical plan optimizations might change the order of statement execution. Use
EXPLAIN
andILLUSTRATE
extensively to study Pig scripts.Help Pig to execute your script faster by following some of the guidelines mentioned in this chapter. Try to make your UDFs implement the
Algebraic
orAccumulator
interface, ideally both.Understand the data you are trying to process. Specialized support is available for some kinds of data quirks, such as Skewed joins for joins on skewed data.
In the next chapter, we will look at advanced features of a higher-level SQL abstraction on Hadoop MapReduce called Hive.