SQL Subqueries: The Super Powerful SQL Feature

In this lesson you explore the Subqueries, the super powerful and flexible feature of the SQL language. SQL Subqueries can be used in the SELECT, INSERT, UPDATE, and DELETE statements. This lesson focuses on using subqueries in the SELECT statement.

The first point to address is where you put the subquery in the SELECT statement. It can go in the WHERE, FROM, and HAVING clauses, or even inside the SELECT list of column. The most typical is the WHERE clause.

SQL Subqueries in the WHERE Clause

From a syntax point of view, a SQL Subquery is a SELECT statement enclosed in parenthesis, and like every SELECT, it can return 0, 1, or multiple rows. The number of records returned by a subquery is very important. The operators you use to create a condition for a subquery varies depending on how many records the subquery returns. The subquery operator rules are shown in the table:

Number of records the subquery returns Operator allowed
1 record = , > , >=, <, <= , <>
0, 1 or many records IN, EXISTS

Here is the example table. Suppose you work on a farm growing apples. The database contains a single table named apple with all the production history. The table has the following schema:

Year Apple Variety Number of Trees in Production Tons Produced Harvest day Price per Ton First summer storm
2020 Red Delicious 2000 102 06/23/2020 54.50 07/03/2020
2020 Magic Green 700 33 06/12/2020 62.60 07/03/2020
2020 Red Globus 500 26 05/30/2020 71.50 07/03/2020
2019 Red Delicious 1800 87 07/15/2019 52.25 07/12/2019
2019 Magic Green 500 26 06/28/2019 59.40 07/12/2019
2019 Red Globus 500 27 05/28/2019 68.00 07/12/2019
2018 Red Delicious 1800 92 07/02/2018 56.75 06/03/2018
2018 Red Globus 500 24 05/30/2018 66. 06/03/2018
2017 Red Delicious 1500 76.5 07/18/2017 51.45 07/30/2017
2016 Red Delicious 1500 72 06/26/2016 47.60 06/23/2016

Suppose you want to obtain the year and the tons_produced for Red Delicious variety and the year that price of Red Delicious touches its maximum value in the history.

Here is the query and subquery in the WHERE clause:


SELECT year,
       tons_produced 
FROM   apple
WHERE  price_per_ton = ( SELECT MAX(price_per_ton) 
                         FROM apple WHERE variety = ‘Red Delicious’
                       )
      AND variety = 'Red Delicious'

Using MAX() constrains the subquery to return a single record, enabling you to use the = operator in the subquery comparison condition. In a case where a subquery returns 0 or multiple rows and an operator is used, the database returns an error.

Here is the result:

year tons_produced
2018 92

Now you want to obtain the year, tons produced, number of trees and the metric tons_per_tree for those years when the tons_per_tree was less than the average for Red Delicious. You can use a subquery to obtain the average of tons_per_tree for Red Delicious, and then compare it with every Red Delicious record tons_per_tree metric.

Here is the query and subquery in the WHERE clause:


SELECT  year, number_of_trees, tons_produced, 
        tons_produced / number_of_trees AS tons_per_tree  
FROM    apple
WHERE   (tons_produced / number_of_trees) < (
                               SELECT AVG(tons_produced / number_of_trees) 
                               FROM apple 
                               WHERE apple_variety = 'Red Delicious'
                                )  
  AND   apple_variety = 'Red Delicious'

The results:

Year number_of_trees tons_produced tons_per_tree
2019 1800 87 0.0483
2016 1500 72 0.0480

So far the subqueries have returned exactly one record. In the next example, the subquery returns several records. The goal is to obtain the same attributes as the previous Red Delicious query for the years where the average tons_per_dollar is > 0.05.

Here is the query:


SELECT  year, number_of_trees, tons_produced, 
        tons_produced / number_of_trees AS tons_per_tree  
FROM    apple
WHERE   year IN (
                  SELECT      year
                    FROM      apple 
                   WHERE      apple_variety = 'Red Delicious'
                  GROUP BY year
                  HAVING AVG(tons_produced / number_of_trees) > 0.05
                )  
  AND    apple_variety = 'Red Delicious'

This subquery can return more than 1 record, so you use the IN operator in the condition.

Here are the results:

Year Number of Trees Tons_produced Tons_per_tree
2020 2000 102 0.051
2018 1800 92 0.051
2017 1500 76 0.051

Now you want to obtain the years where you did not produce the Red Globus variety. For this query, use the EXISTS unary operator. A unary means the operator only needs one operand, which in this case, is the subquery. Here is the form for the EXISTS condition:

WHERE EXISTS ( subquery )

The EXISTS condition returns TRUE if the subquery returns any row or FALSE if the subquery returns 0 rows (empty set).

Here is the query to obtain the years where Red Globus was not produced:


SELECT DISTINCT year 
FROM   apple a1
WHERE  NOT EXISTS (
                    SELECT       year
                      FROM       apple a2 
                     WHERE       a2.apple_variety = 'Red Globus'
                       AND       a2.year = a1.year
                 )

Two items to note in this query:

  1. The DISTINCT clause is used to remove duplicate records from the result.
  2. The condition a2.year = a1.year inside the subquery is comparing the year column of a record from the subquery against the year column of a record in the outer query. This is called a Correlated Subquery as it uses columns in the outer query in the WHERE clause of the subquery.

Here is the result:

Year
2016
2017

SQL Subqueries In the FROM Clause

This example uses a subquery in the FROM clause of the SELECT to produce an temporary table called red_delicious_prod.

Here is the query:


SELECT
 red_delicious_prod.year, 
 red_delicious_prod.tons_produced, 
 red_delicious_prod.price_per_ton, 
 red_delicious_prod.tons_produced * red_delicious_prod.price_per_ton AS revenue 
FROM     ( 
                  SELECT year, tons_produced, price_per_ton 
                  FROM    apple 
                  WHERE apple_variety = ‘Red Delicious’ 
         ) red_delicious_prod

The outer query references the temporary red_delicious_prod table created on the fly during the query execution. When the query is complete, the red_delicious_prod table is no longer available.

Closing Words

In this lesson you learned to use subqueries in the SELECT statement. You used subqueries with different operators in the WHERE clause and in the FROM clause. There are other places in the SELECT statement where you can put a subquery. For example you can put a subquery in the SELECT list, or in the HAVING clause. The next lesson covers details of the DISTINCT clause. Keep going, learn SQL, increase your skills!