Book Image

Apache Hive Essentials

By : Dayong Du
Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Table of Contents (17 chapters)
Apache Hive Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

HCatalog


HCatalog (see https://cwiki.apache.org/confluence/display/Hive/HCatalog) is a metadata management system for Hadoop data. It stores consistent schema information for Hadoop ecosystem tools, such as Pig, Hive, and MapReduce. By default, HCatalog supports data in the format of RCFile, CSV, JSON, SequenceFile, ORC file, and a customized format if InputFormat, OutputFormat, and SerDe are implemented. By using HCatalog, users are able to directly create, edit, and expose (via its REST API) metadata, which becomes effective immediately in all tools sharing the same piece of metadata. At first, HCatalog was a separate Apache project from Hive and was part of Apache Incubator, where most Apache projects first started. Eventually, HCatalog became a part of the Hive project in 2013 starting with Hive 0.11.0.

HCatalog is built on top of the Hive metastore and incorporates support for Hive DDL. It provides read and write interfaces and HCatLoader and HCatStorer, for Pig, by implementing Pig...