In this recipe, you will learn how to use predicate pushdown in Hive.
Predicate pushdown is a traditional RDBMS term, whereas in Hive, it works as predicate pushup. In this, the focus is on to execute all the expressions such as filters as early as possible to optimize the performance of a query. For example, let's look at the query mentioned later, which includes a join condition as well as a filter condition:
SELECT a.*, b.* FROM Sales a JOIN Sales_orc b ON a.id = b.id WHERE a.id > 100 AND b.id > 300;
In the preceding query, a JOIN
is performed at the ID
column of both the tables and then the result set is filtered out with the help of the filter condition. The drawback here is that the join
condition is executed first followed by the filter condition. Now suppose if most of the rows are filtered out by the filter
expression, then in this case, executing the filter
condition after the JOIN
clause is of no use. There has...