SQL Correlated Subqueries Increase the Power of SQL

A SQL correlated subquery is a query which is executed one time for each record returned by the outer query. It is called correlated as it is a correlation between the number of times the subquery is executed with the number of records returned by the outer query (not the subquery).

Correlated subqueries are a different type of subquery. In some situations, you must use a correlated subquery to obtain the correct reply or result data set. This lesson covers how to identify the situations that require a correlated subquery. There are examples of correlated subqueries, and a discussion about overusing correlated subqueries.

Using SQL Correlated Subqueries

There are some data questions where the SQL to solve the question must include a correlated subquery. This is particularly true in queries where you are looking for what might be called negatives or negative data questions.

These examples use two tables: employee and payment_history. The payment_history table has a column called payment_type that indicates if a payment is part of a regular salary, a bonus, or an award. Here is some sample data for the tables:

Table employee

employee_id first_name last_name
10100 Marcos Bisset
10101 Kate Perez
10102 Carlos Casco

Table payment_history

employee_id payment_type amount_paid payment_date
10100 monthly pay 12000.00 2018-02-02
10101 monthly pay 2800.00 2018-02-02
10102 monthly pay 1900.00 2018-03-02
10101 bonus 1500.00 2018-03-08
10102 adjustment 124.70 2018-03-10

Here is an SQL query to answer a negative data question: List the employees who have never received a bonus.

SELECT last_name, first_name
  FROM employee e1
                     FROM payment_history ph
                    WHERE ph.employee_id = e1.employee_id
                      AND ph.payment_type = ‘bonus’

The result:

first_name last_name
Marcos Bisset
Carlos Casco

Correlated vs Simple Subqueries

The main difference between correlated and simple subqueries is that correlated subqueries reference columns from the outer table. In the example, ph.employee_id = e1.employee_id is a reference to the outer subquery table. To identify a correlated query, just look for these kinds of references. If you find at least one, you have a correlated subquery!

The negative part of the data question is often solved in the correlated subquery by using a NOT EXISTS operator in the WHERE clause. EXISTS is an operator always followed by a subquery. If the subquery returns at least one record, then the EXISTS evaluates to TRUE. If the subquery returns empty, then the EXISTS evaluates to FALSE. Note we use NOT EXISTS, which is the opposite to EXISTS.

What About Positive Data Questions?

Should you use a correlated subquery for a positive data question? No. You can if you want to, but a JOIN condition or a relationship between two tables to answer data questions is a better approach for positive data questions.

Here is an SQL query to answer a positive data question: List the employees who have received a bonus.

SELECT first_name, last_name
   FROM employee e1, payment_history ph
WHERE ph.employee_id = e1.employee_id
      AND ph.payment_type = ‘bonus’

The query uses an implicit JOIN by putting both tables in the FROM clause. The implicit JOIN is equivalent to the query:

SELECT first_name, last_name
   FROM employee e1
   INNER JOIN payment_history ph ON ph.employee_id = e1.employee_id
WHERE  ph.payment_type = ‘bonus’

The result is:

first_name last_name
Kate Perez

It is a best practice to solve data question with JOIN instead of a correlated subquery whenever possible. You should only use correlated subqueries for negative data questions or other data questions where a correlated subquery is the only way to answer it.

Performance is the biggest reason to avoid correlated subqueries. They tend to be really slow. Avoid them when you can.

A Complex Example Using a Correlated Subquery

In this example, obtain the employee names who made a higher salary in March 2018 than their average monthly salary for all previous months.

SELECT first_name, last_name
FROM employee e1, payment_history ph
WHERE e1.employee_id = ph.employee_id
AND amount_paid > = (
                                       SELECT avg(amount_paid)
                                        FROM  payment_histoty ph2
                                        WHERE ph2.employee_id = e1.employee_id
                                              AND ph2.payment_date < ‘2018-03-01’
AND month(ph.payment_date) = 3
AND year(ph.payment_date) = 2018
AND ph.payment_type = ‘salary’

You can identify the reference to the outer subquery table marked in red, so you know this is a correlated subquery. The subquery executes once per record instead of just a single time like a simple query.

Closing Words

In this lesson you learned how to identify and use correlated subqueries, a special kind of subquery, which is used under specific circumstances. The next lesson covers SQL Window Functions: very powerful functions that add new possibilities to the SQL language. Keep going, learn SQL and increase your skills!