Book Image

Apache Hive Essentials

By : Dayong Du
Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Table of Contents (17 chapters)
Apache Hive Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Hive partitions


By default, a simple query in Hive scans the whole Hive table. This slows down the performance when querying a large-size table. The issue could be resolved by creating Hive partitions, which is very similar to what's in the RDBMS. In Hive, each partition corresponds to a predefined partition column(s) and stores it as a subdirectory in the table's directory in HDFS. When the table gets queried, only the required partitions (directory) of data in the table are queried, so the I/O and time of query is greatly reduced. It is very easy to implement Hive partitions when the table is created and check the partitions created, as follows:

--
Create partitions when creating tables
jdbc:hive2://> CREATE TABLE employee_partitioned
. . . . . . .> (
. . . . . . .>   name string,
. . . . . . .>   work_place ARRAY<string>,
. . . . . . .>   sex_age STRUCT<sex:string,age:int>,
. . . . . . .>   skills_score MAP<string,int>,
. . . . . . .>   depart_title...