Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

UDF, UDAF, and UDTF


Like in Pig, UDFs are one of the most important extensibility features in Hive. Writing a UDF in Hive is simpler, but the interfaces do not define every override method that is needed to make the UDF complete. This is because UDFs can take any number of parameters, and it is difficult to provide a fixed interface. Hive uses Java reflection under the hood when executing the UDF to figure out the parameter list for the function.

These are the following three kinds of UDFs in Hive:

  • Regular UDFs: These UDFs take in a single row and produce a single row after application of the custom logic.

  • UDAFs: These are aggregators that take in multiple rows but output a single row. SUM and COUNT are examples of in-built UDAFs.

  • UDTFs: These are generator functions that take in a single row and produce multiple rows as outputs. The EXPLODE function is a UDTF.

The following code example shows how a simple UDF is written. Every UDF is extended from the UDF class present in org.apache.hadoop...