Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

The Hive architecture

The following diagram shows the Hive architecture. We will look at each component in detail:

The Hive architecture

The Hive metastore

The metastore is a database for system-related metadata. It stores details about the tables, partitions, schemas, column types, and table locations. It can be accessed via the Thrift interface, making it possible to read this data using clients written in many different programming languages. The data is stored in a relational database system and uses an Object-relational mapping (ORM) layer to read and write data into the store. The choice of using an RDBMS for the metastore was made to reduce the latency when serving this information to the Hive query compiler.

The ORM layer of the metastore allows a pluggable model where any RDBMS can be plugged into Hive. The default RDBMS used is Apache Derby, an open source relational data store. In practice, organizations use MySQL and other popular RDBMS suites to host the metastore. The data in the metastore imposes...