Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

UDF, UDAF, and UDTF


Like in Pig, UDFs are one of the most important extensibility features in Hive. Writing a UDF in Hive is simpler, but the interfaces do not define every override method that is needed to make the UDF complete. This is because UDFs can take any number of parameters, and it is difficult to provide a fixed interface. Hive uses Java reflection under the hood when executing the UDF to figure out the parameter list for the function.

These are the following three kinds of UDFs in Hive:

  • Regular UDFs: These UDFs take in a single row and produce a single row after application of the custom logic.

  • UDAFs: These are aggregators that take in multiple rows but output a single row. SUM and COUNT are examples of in-built UDAFs.

  • UDTFs: These are generator functions that take in a single row and produce multiple rows as outputs. The EXPLODE function is a UDTF.

The following code example shows how a simple UDF is written. Every UDF is extended from the UDF class present in org.apache.hadoop...