Aggregation Functions in SQL
This lesson is connected to the previous lesson SQL GROUP BY Clause in a SELECT Query. In fact, you used some aggregation functions in the GROUP BY
examples. This lesson covers aggregation functions with and without the GROUP BY
clause.
Aggregation functions in SQL work on a set of records, producing a result for each set of records they process. For example, you can use the AVG()
function to obtain the average of a column value in a set of records, MAX()
to obtain the maximum value in a set of records, or SUM()
to calculate the sum all the values of one specific column in a set of records.
How do you create the set of records for the aggregation functions? By using the SQL GROUP BY
clause. However, you can also use the full result set of the query as a single record set.
Here is the example table. Suppose you work on a farm growing apples. The database contains a single table named apple with all the production history. The table has the following schema:
Year | Apple Variety | Number of Trees in Production | Tons Produced | Harvest day | Price per Ton | First summer storm |
---|---|---|---|---|---|---|
2020 | Red Delicious | 2000 | 102 | 06/23/2020 | 54.50 | 07/03/2020 |
2020 | Magic Green | 700 | 33 | 06/12/2020 | 62.60 | 07/03/2020 |
2020 | Red Globus | 500 | 26 | 05/30/2020 | 71.50 | 07/03/2020 |
2019 | Red Delicious | 1800 | 87 | 07/15/2019 | 52.25 | 07/12/2019 |
2019 | Magic Green | 500 | 26 | 06/28/2019 | 59.40 | 07/12/2019 |
2019 | Red Globus | 500 | 27 | 05/28/2019 | 68.00 | 07/12/2019 |
2018 | Red Delicious | 1800 | 92 | 07/02/2018 | 56.75 | 06/03/2018 |
2018 | Red Globus | 500 | 24 | 05/30/2018 | 66.00 | 06/03/2018 |
2017 | Red Delicious | 1500 | 76.5 | 07/18/2017 | 51.45 | 07/30/2017 |
2016 | Red Delicious | 1500 | 72 | 06/26/2016 | 47.60 | 06/23/2016 |
Here is a simple query using the aggregate function SUM()
to calculate the total kilos that the farm produced throughout its history:
SELECT SUM(tons_produced) AS “Total_produced_all_years”
FROM apple
There is no GROUP BY
clause, so the entire table apple is considered a single set of records, and the SUM()
aggregation function is applied just one time to all the table records, producing a single result:
Total_produced_all_years |
---|
565.5 |
Another frequently used aggregation function is COUNT()
, usually used with an “*” as a parameter indicating to count records. For example, this query reports how many years the Red Globus variety has been in production:
SELECT COUNT(*) AS “Years_producing_Red_Globus”
FROM apple
WHERE apple_variety = ‘Red Globus’
The result:
Years_producing_Red_Globus |
---|
3 |
You can put multiple aggregation functions in the same query. For example, add a calculation of the total revenue of one year and then apply the SUM()
to all the revenue values:
SELECT SUM(tons_produced) AS “Total_produced_all_years”,
SUM(tons_produced * price_per_ton) AS “Income_Revenue_all_years”
FROM apple
The result:
Total_produced_all_years (tons) | Income_Revenue_all_years ($) |
---|---|
565.5 | $ 31578.07 |
SQL Aggregation Functions Combined With GROUP BY Clause
Aggregation functions are more powerful when combined with the GROUP BY
clause.
Suppose you want to obtain the average price_per_ton
for each apple variety in the last 10 years, plus the maximum and minimum price values. Here is the query:
SELECT apple_variety
AVG(price_per_ton) AS “Average_price”,
MAX(price_per_ton) AS “Max_price”,
MIN(price_per_ton) AS “Min_price”
FROM apple
WHERE year >= 2010
GROUP BY apple_variety
The result:
Apple Variety | Average_price | Max_price | Min_price |
---|---|---|---|
Red Delicious | 52.51 | 56.75 | 47.60 |
Magic Green | 61.00 | 62.60 | 59.40 |
Red Globus | 68.50 | 71.50 | 66.00 |
This table shows several SQL standard aggregation functions:
Aggregation function name | Description |
---|---|
AVG(expression) | Average of the values returned by expression |
SUM(expression) | Sum of all the values returned by expression |
MIN(expression) | Minimum value of all the values returned by expression |
MAX(expression) | Maximum value of all the values returned by expression |
COUNT(*) | Count all the records returned in a query |
COUNT(expression) | Counts when expression is not null |
COUNT(distinct column) | Counts unique values in a column |
As mentioned in a previous lesson, starting the harvest before the first summer storm is very important for apple productivity. Suppose you want to know how many years the harvest was started before the first storm. Here is the query:
SELECT COUNT(DISTINCT year) AS “years harvesting before summer storms”
FROM apple
WHERE harvest_day < first_summer_storm
The COUNT(DISTINCT year)
function is used to avoid counting the same year multiple times.
Here is the result:
Years harvesting before summer storms |
---|
4 |
Closing Words
In this lesson you saw example queries using aggregation functions combined with GROUP BY
and without GROUP BY
. You learned the standard SQL aggregation functions. There are other non standard aggregation functions that every database provides as part of its dialect-specific SQL. You can take a look at your database documentation to identify what alternative aggregation functions you have in your SQL dialect.
The next lesson covers the HAVING
clause, which as a filtering clause for the groups of records created with the GROUP BY
clause. Keep going, learn SQL and increase your skills!