Book Image

Apache Hive Essentials

By : Dayong Du
Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Table of Contents (17 chapters)
Apache Hive Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Basic aggregation – GROUP BY


Data aggregation is any process to gather and express data in a summary form to get more information about particular groups based on specific conditions. Hive offers several built-in aggregate functions, such as MAX, MIN, AVG, and so on. Hive also supports advanced aggregation by using GROUPING SETS, ROLLUP, CUBE, analytic functions, and windowing.

The Hive basic built-in aggregate functions are usually used with the GROUP BY clause. If there is no GROUP BY clause specified, it aggregates over the whole table by default. Besides aggregate functions, all other columns that are selected must also be included in the GROUP BY clause. The following are a few examples using the built-in aggregate functions:

  • Aggregation without GROUP BY columns:

    jdbc:hive2://> SELECT count(*) AS row_cnt FROM employee;
    +----------+
    | row_cnt  |
    +----------+
    | 5        |
    +----------+
    1 row selected (60.709 seconds)
    
  • Aggregation with GROUP BY columns:

    jdbc:hive2://> SELECT sex_age.sex...