Book Image

Learning Hadoop 2

By : Gerald Turkington, GABRIELE MODENA
Book Image

Learning Hadoop 2

By: Gerald Turkington, GABRIELE MODENA

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
About the Authors
About the Reviewers

Extending HiveQL

The HiveQL language can be extended by means of plugins and third-party functions. In Hive, there are three types of functions characterized by the number of rows they take as input and produce as output:

  • User Defined Functions (UDFs): are simpler functions that act on one row at a time.

  • User Defined Aggregate Functions (UDAFs): take multiple rows as input and generate multiple rows as output. These are aggregate functions to be used in conjunction with a GROUP BY statement (similar to COUNT(), AVG(), MIN(), MAX(), and so on).

  • User Defined Table Functions (UDTFs): take multiple rows as input and generate a logical table comprised of multiple rows that can be used in join expressions.


These APIs are provided only in Java. For other languages, it is possible to stream data through a user-defined script using the TRANSFORM, MAP, and REDUCE clauses that act as a frontend to Hadoop's streaming capabilities.

Two APIs are available to write UDFs. A simple API org.apache.hadoop...