Book Image

Apache Hive Cookbook

Book Image

Apache Hive Cookbook

Overview of this book

Hive was developed by Facebook and later open sourced in Apache community. Hive provides SQL like interface to run queries on Big Data frameworks. Hive provides SQL like syntax also called as HiveQL that includes all SQL capabilities like analytical functions which are the need of the hour in today’s Big Data world. This book provides you easy installation steps with different types of metastores supported by Hive. This book has simple and easy to learn recipes for configuring Hive clients and services. You would also learn different Hive optimizations including Partitions and Bucketing. The book also covers the source code explanation of latest Hive version. Hive Query Language is being used by other frameworks including spark. Towards the end you will cover integration of Hive with these frameworks.
Table of Contents (19 chapters)
Apache Hive Cookbook
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

Hive packages


The following are the various sections included in Hive packages.

Getting ready

Hive source consists of different modules categorized by the features they provide or as a submodule of some other module.

How to do it...

The following is the list of Hive modules and their usage in Hive:

  • accumulo-handler: Apache accumulo is a distributed key-value datastore based on Google Big Table. This package includes the components responsible for mapping the Hive table to the accumulo table. AccumuloStorageHandler and AccumuloPredicateHandler are the main classes responsible for mapping tables. For more information, refer to the official integration documentation available at https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration.

  • ant: This tool is used to build earlier versions of Hive source. Ant is also needed to configure the Hive Web Interface server.

  • beeline: A Hive client used to connect with HiveServer2 and run Hive queries.

  • bin: This package includes scripts to start Hive clients and services.

  • cli: This is a Hive Command-line Interface implementation.

  • common: These are utility classes used by other modules.

  • conf: This contains default configurations and uses defined configuration objects.

  • contrib: This contains Serdes, generic UDF, and fileformat contributed by third parties to Hive.

  • hbase-handler: This module allows Hive SQL statements to access HBase tables for SELECT and INSERT commands. It also provides interfaces to access HBase and Hive tables for join and union in a single query. More information is available at https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration.

  • hcatalog: This is a table management framework that helps other frameworks such as Pig or MapReduce to access the Hive metastore and table schema.

  • hwi: This module provides an implementation of a web interface to run Hive queries. Also, the WebHCat APIs provide REST APIs to access the Hive metastore.

  • Jdbc: This is a connector that accepts JDBC connections and calls to execute Hive queries on the cluster.

  • Metastore: This is the API that provides access to metastore entities including database, table, schema, and serdes.

  • odbc: This module implements the Open Database Connectivity (ODBC) API, enabling ODBC applications to connect and execute queries over Hive.

  • ql: This module provides an interface to clients that checks for query semantics and provides an implementation for driver, parser, and query planner.

  • Serde: This module has an implementation of serializer and deserializer used by Hive to read and write data. It helps in validating and parsing record and field types.

  • shims: This is the module that transparently intercepts and modifies calls to the Hive API, usually for compatibility purposes.

  • spark-client: This module provides an interface to execute Hive SQLs on a Spark framework.