At the time of writing this book, Impala 1.2.0 Beta was available to test with CDH 5.0. Impala 1.2.0 has several features visible to users; however, lots of other features are under the hood to improve performance, security, and flexibility. A few notable features are as follows:
Impala supports user-defined functions (UDF) natively, and users can write scalar UDF and user-defined aggregate functions (UDA).
Functions written in C++ and Java can work with Impala as they are.
Currently,
REFRESH
statements are required after every use of table-specific SQL commands, such asCREATE TABLE
,ALTER TABLE
,DROP TABLE
,INSERT
, andLOAD DATA
, to update information to the whole cluster. Impala now has an automatic synchronization mechanism, so there is no need forREFRESH
orINVALIDATE METADATA
SQL commands. With the automatic synchronization mechanism, a newly created service takes charge of updating table or metadata specific information to the whole Impala cluster as the changes are available.Another big update is integration with YARN, in which Impala uses the YARN resource management framework for adequate resource management during query processing.