In the previous section, we saw subqueries that only returned a single result because an aggregate function was used in the subquery. Subqueries can also return zero or more rows.
Subqueries that return multiple rows can be used with the ALL
, IN
, ANY
, or SOME
operators. We can also negate the condition like NOT IN
.
A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike non-correlated subqueries that are executed exactly once prior to the execution of a containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query.
The following statement illustrates the syntax of a correlated subquery:
SELECT column1,column2,.. FROM table 1 outer WHERE column1 operator( SELECT column1 from table 2 WHERE column2=outer.column4)
The PostgreSQL runs will pass the value of column4
from the outer table to the inner query and will be compared to column2
of table 2
. Accordingly, column1
will be fetched from table 2
and depending on the operator it will be compared to column1
of the outer table. If the expression turned out to be true, the row will be passed; otherwise, it will not appear in the output.
But with the correlated queries you might see some performance issues. This is because of the fact that for every record of the outer query, the correlated subquery will be executed. The performance is completely dependent on the data involved. However, in order to make sure that the query works efficiently, we can use some temporary tables.
Let's try to find all the employees who earn more than the average salary in their department:
SELECT last_name, salary, department_id FROM employee outer WHERE salary > (SELECT AVG(salary) FROM employee WHERE department_id = outer.department_id);
For each row from the employee
table, the value of department_id
will be passed into the inner query (let's consider that the value of department_id
of the first row is 30
) and the inner query will try to find the average salary of that particular department_id = 30
. If the salary of that particular record will be more than the average salary of department_id = 30
, the expression will turn out to be true and the record will come in the output.
The PostgreSQL EXISTS
condition is used in combination with a subquery, and is considered to be met if the subquery returns at least one row. It can be used in a SELECT
, INSERT
, UPDATE
, or DELETE
statement. If a subquery returns any rows at all, the EXISTS
subquery is true, and the NOT EXISTS
subquery is false.
The syntax for the PostgreSQL EXISTS
condition is as follows:
WHERE EXISTS ( subquery );
The subquery
is a SELECT
statement that usually starts with SELECT *
rather than a list of expressions or column names. To increase performance, you could replace SELECT *
with SELECT 1
as the column result of the subquery is not relevant (only the rows returned matter).
Note
The SQL statements that use the EXISTS
condition in PostgreSQL are very inefficient as the subquery is re-run for every row in the outer query's table. There are more efficient ways, such as using joins to write most queries, that do not use the EXISTS
condition.
Let's look at the following example that is a SELECT
statement and uses the PostgreSQL EXISTS
condition:
SELECT * FROM products WHERE EXISTS (SELECT 1 FROM inventory WHERE products.product_id = inventory.product_id);
This PostgreSQL EXISTS
condition example will return all records from the products
table where there is at least one record in the inventory
table with the matching product_id
. We used SELECT 1
in the subquery to increase performance as the column result set is not relevant to the EXISTS
condition (only the existence of a returned row matters).
The PostgreSQL EXISTS
condition can also be combined with the NOT
operator, for example:
SELECT * FROM products WHERE NOT EXISTS (SELECT 1 FROM inventory WHERE products.product_id = inventory.product_id);
This PostgreSQL NOT EXISTS
example will return all records from the products
table where there are no records in the inventory
table for the given product_id
.