Aggregate Functions in SQL

This lesson is connected to the previous lesson SQL GROUP BY Clause in a SELECT Query. In fact, you used some aggregate functions in the GROUP BY examples. This lesson covers aggregate functions with and without the GROUP BY clause.

Aggregate functions in SQL work on a set of records, producing a result for each set of records they process. For example, you can use the AVG() function to obtain the average of a column value in a set of records, MAX() to obtain the maximum value in a set of records, or SUM() to calculate the sum all the values of one specific column in a set of records.

How do you create the set of records for the aggregate functions? By using the SQL GROUP BY clause. However, you can also use the full result set of the query as a single record set.

Here is the example table. Suppose you work on a farm growing apples. The database contains a single table named apple with all the production history. The table has the following schema:

Year Apple Variety Number of Trees in Production Tons Produced Harvest day Price per Ton First summer storm
2020 Red Delicious 2000 102 06/23/2020 54.50 07/03/2020
2020 Magic Green 700 33 06/12/2020 62.60 07/03/2020
2020 Red Globus 500 26 05/30/2020 71.50 07/03/2020
2019 Red Delicious 1800 87 07/15/2019 52.25 07/12/2019
2019 Magic Green 500 26 06/28/2019 59.40 07/12/2019
2019 Red Globus 500 27 05/28/2019 68.00 07/12/2019
2018 Red Delicious 1800 92 07/02/2018 56.75 06/03/2018
2018 Red Globus 500 24 05/30/2018 66.00 06/03/2018
2017 Red Delicious 1500 76.5 07/18/2017 51.45 07/30/2017
2016 Red Delicious 1500 72 06/26/2016 47.60 06/23/2016

Here is a simple query using the aggregate function SUM() to calculate the total kilos that the farm produced throughout its history:


SELECT SUM(tons_produced) AS “Total_produced_all_years”
FROM    apple

There is no GROUP BY clause, so the entire table apple is considered a single set of records, and the SUM() aggregation function is applied just one time to all the table records, producing a single result:

Total_produced_all_years
565.5

Another frequently used aggregation function is COUNT(), usually used with an “*” as a parameter indicating to count records. For example, this query reports how many years the Red Globus variety has been in production:


SELECT COUNT(*) AS “Years_producing_Red_Globus”
FROM    apple
WHERE apple_variety = ‘Red Globus’

The result:

Years_producing_Red_Globus
3

You can put multiple aggregate functions in the same query. For example, add a calculation of the total revenue of one year and then apply the SUM() to all the revenue values:


SELECT  SUM(tons_produced) AS “Total_produced_all_years”,
        SUM(tons_produced * price_per_ton) AS “Income_Revenue_all_years”
FROM    apple

The result:

Total_produced_all_years (tons) Income_Revenue_all_years ($)
565.5 $ 31578.07

SQL Aggregate Functions Combined With GROUP BY Clause

Aggregate functions are more powerful when combined with the GROUP BY clause.

Suppose you want to obtain the average price_per_ton for each apple variety in the last 10 years, plus the maximum and minimum price values. Here is the query:


SELECT apple_variety
       AVG(price_per_ton) AS “Average_price”,
       MAX(price_per_ton) AS “Max_price”,
       MIN(price_per_ton) AS “Min_price”
FROM   apple
WHERE  year >= 2010
GROUP BY apple_variety

The result:

Apple Variety Average_price Max_price Min_price
Red Delicious 52.51 56.75 47.60
Magic Green 61.00 62.60 59.40
Red Globus 68.50 71.50 66.00

This table shows several SQL standard aggregate functions:

Aggregation function name Description
AVG(expression) Average of the values returned by expression
SUM(expression) Sum of all the values returned by expression
MIN(expression) Minimum value of all the values returned by expression
MAX(expression) Maximum value of all the values returned by expression
COUNT(*) Count all the records returned in a query
COUNT(expression) Counts when expression is not null
COUNT(distinct column) Counts unique values in a column

As mentioned in a previous lesson, starting the harvest before the first summer storm is very important for apple productivity. Suppose you want to know how many years the harvest was started before the first storm. Here is the query:


SELECT COUNT(DISTINCT year) AS “years harvesting before summer storms”
FROM   apple
WHERE  harvest_day < first_summer_storm

The COUNT(DISTINCT year) function is used to avoid counting the same year multiple times.

Here is the result:

Years harvesting before summer storms
4

Closing Words

In this lesson you saw example queries using aggregate functions combined with GROUP BY and without GROUP BY. You learned the standard SQL aggregate functions. There are other non standard aggregate functions that every database provides as part of its dialect-specific SQL. You can take a look at your database documentation to identify what alternative aggregate functions you have in your SQL dialect.

The next lesson covers the HAVING clause, which as a filtering clause for the groups of records created with the GROUP BY clause. Keep going, learn SQL and increase your skills!