In this recipe, you will learn how you can define tables in HCatalog.
HCatalog is a storage management tool that enables frameworks other than Hive to leverage a data model to read and write data. HCatalog tables provide an abstraction on the data format in HDFS and allow frameworks such as PIG
and MapReduce
to use the data without being concerned about the data format, such as RC
, ORC
, and text files.
HCatInputFormat
and HCatOutputFormat
, which are the implementations of Hadoop InputFormat
and OutputFormat
, are the interfaces provided to PIG
and MapReduce
.
Data is defined using the HCatalog CLI. Data is modeled as tables and tables are stored in databases. The table could be partitioned based on keys.