Book Image

Apache Hive Essentials

By : Dayong Du
Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Table of Contents (17 chapters)
Apache Hive Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Job and query optimization


Job and query optimization covers experience and skills to improve performance in the area of job-running mode, JVM reuse, job parallel running, and query optimizations in JOIN.

Local mode

Hadoop can run in standalone, pseudo-distributed, and fully distributed mode. Most of the time, we need to configure Hadoop to run in fully distributed mode. When the data to process is small, it is an overhead to start distributed data processing since the launching time of the fully distributed mode takes more time than the job processing time. Since Hive 0.7.0, Hive supports automatic conversion of a job to run in local mode with the following settings:

jdbc:hive2://> SET hive.exec.mode.local.auto=true; --default false
jdbc:hive2://> SET hive.exec.mode.local.auto.inputbytes.max=50000000;
jdbc:hive2://> SET hive.exec.mode.local.auto.input.files.max=5;
--default 4

A job must satisfy the following conditions to run in the local mode:

  • The total input size of the job is lower...