Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

HCatalog – performing Java MapReduce computations on data mapped to Hive tables


HCatalog is a meta-data abstraction layer for files stored in HDFS and makes it easy for different components to process data stored in HDFS. HCatalog abstraction is based on tabular table model and augments structure, location, storage format and other meta-data information for the data sets stored in HDFS. With HCatalog, we can use data processing tools such as Pig, Java MapReduce and others read and write data to Hive tables without worrying about the structure, storage format or the storage location of the data. HCatalog is very useful when you want to execute a Java MapReduce job or a Pig script on a data set that is stored in Hive using a binary data format such as ORCFiles. The topology can be seen as follows:

HCatalog achieves this capability by providing an interface to the Hive MetaStore enabling the other applications to utilize the Hive table metadata information. We can query the table information...