Book Image

Apache Hive Essentials

By : Dayong Du
Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Table of Contents (17 chapters)
Apache Hive Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Special JOIN – MAPJOIN


The MAPJOIN statement means doing the JOIN operation only by map without the reduce job. The MAPJOIN statement reads all the data from the small table to memory and broadcasts to all maps. During the map phase, the JOIN operation is performed by comparing each row of data in the big table with small tables against the join conditions. Because there is no reduce needed, the JOIN performance is improved. When the hive.auto.convert.join setting is set to true, Hive automatically converts the JOIN to MAPJOIN at runtime if possible instead of checking the map join hint. In addition, MAPJOIN can be used for unequal joins to improve performance since both MAPJOIN and WHERE are performed in the map phase. The following is an example of MAPJOIN that is enabled by query hint:

jdbc:hive2://> SELECT /*+ MAPJOIN(employee) */ emp.name, emph.sin_number
. . . . . . .> FROM employee emp
. . . . . . .> CROSS JOIN employee_hr emph WHERE emp.name <> emph.name;

The MAPJOIN...