Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Apache Tez


Apache Tez is an extensible framework for YARN-based high-performance data processing applications. Projects such as Hive and Pig can leverage this framework for improved performance and faster response times and they can be used for interactive needs.

HDInsight 3.1 is capable of running Hive queries using Tez, which provides substantial performance improvements over MapReduce. By default, Tez is not enabled for Hive and can be enabled, as shown in the following code snippet:

set hive.execution_engine=tez;
select flightyear, flightquarter, flightmonth ,  
    regexp_replace(uniquecarrier,"\"","") as airlinecarrier,  avg(depdelay) as avgdepdelay
from airline_otp_refined
group by flightyear, flightquarter, flightmonth ,
regexp_replace(uniquecarrier,"\"","");