Book Image

Apache Hive Essentials

By : Dayong Du
Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Table of Contents (17 chapters)
Apache Hive Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The SELECT statement


The most common use case of using Hive is to query the data in Hadoop. To achieve this, we need to write and execute the SELECT statement in Hive. The typical work done by the SELECT statement is to project the rows meeting query conditions specified in the WHERE clause after the target table and return the result set. The SELECT statement is quite often used with FROM, DISTINCT, WHERE, and LIMIT keywords. We will introduce them through examples as follows.

The SELECT * statement here means all the columns in the table are selected. By default, all rows are returned including duplicated rows. If the DISTINCT keyword is used, only unique rows from the table are selected and returned. The LIMIT keyword is used to limit the number of rows returned randomly. In addition, SELECT * scans the whole table/file without triggering MapReduce jobs, so it runs faster than SELECT <column_name>. Since Hive 0.10.0, the simple SELECT statements, such as SELECT <column_name&gt...