Apache Hive Essentials

Book Image

Apache Hive Essentials

By : Dayong Du

Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Apache Hive Essentials

Apache Hive Essentials

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Overview of Big Data and Hive

Overview of Big Data and Hive

A short history

Introducing big data

Relational and NoSQL database versus Hadoop

Batch, real-time, and stream processing

Overview of the Hadoop ecosystem

Setting Up the Hive Environment

Setting Up the Hive Environment

Installing Hive from Apache

Installing Hive from vendor packages

Starting Hive in the cloud

Using the Hive command line and Beeline

The Hive-integrated development environment

Data Definition and Description

Data Definition and Description

Understanding Hive data types

Data type conversions

Hive Data Definition Language

Hive internal and external tables

Hive partitions

Data Selection and Scope

Data Selection and Scope

The SELECT statement

The INNER JOIN statement

The OUTER JOIN and CROSS JOIN statements

Special JOIN – MAPJOIN

Set operation – UNION ALL

Data Manipulation

Data Manipulation

Data exchange – LOAD

Data exchange – INSERT

Data exchange – EXPORT and IMPORT

Operators and functions

Data Aggregation and Sampling

Data Aggregation and Sampling

Basic aggregation – GROUP BY

Advanced aggregation – GROUPING SETS

Advanced aggregation – ROLLUP and CUBE

Aggregation condition – HAVING

Analytic functions

Performance Considerations

Performance Considerations

Performance utilities

Design optimization

Data file optimization

Job and query optimization

Extensibility Considerations

Extensibility Considerations

User-defined functions

Security Considerations

Security Considerations

Working with Other Tools

Working with Other Tools

JDBC / ODBC connector

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Summary

After going through this chapter, we are now able to understand why and when to use big data instead of a traditional relational database. We also understand the difference between batch processing, real-time processing, and stream processing. We got familiar with the Hadoop ecosystem, especially Hive. We have also gone back in time and brushed through the history of database and warehouse to big data along with some big data terms, the Hadoop ecosystem, Hive architecture, and the advantage of using Hive. In the next chapter, we will practice setting up Hive and all the tools needed to get started using Hive in the command line.