Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


In this chapter, we went through the advanced features of Pig. We looked into the optimizations that Pig has to offer. The following are a few key takeaways from this chapter:

  • As a rule, try to use Pig in as many situations as you can. Pig's abstractions, development aids, and flexibility can save you both time and money. Stretch Pig's capabilities before reverting to MapReduce jobs.

  • The logical plan optimizations might change the order of statement execution. Use EXPLAIN and ILLUSTRATE extensively to study Pig scripts.

  • Help Pig to execute your script faster by following some of the guidelines mentioned in this chapter. Try to make your UDFs implement the Algebraic or Accumulator interface, ideally both.

  • Understand the data you are trying to process. Specialized support is available for some kinds of data quirks, such as Skewed joins for joins on skewed data.

In the next chapter, we will look at advanced features of a higher-level SQL abstraction on Hadoop MapReduce called Hive.