SQL Window Function Examples

This lesson presents SQL window function examples. SQL Window functions are powerful functions that compute their result based on a set of rows (the window), instead of computing based on a single row. They are similar to SQL aggregate functions.

One important difference between window functions and aggregate functions is when you use aggregate functions with the GROUP BY clause, you lose the individual row. This prevents you from mixing attributes of the individual row with results of the aggregate function. Window functions do not have this restriction, enabling you to mix the results with record level fields. This is really good news for SQL developers.

Using SQL Window Functions

In the SQL Window Functions lesson you learned the syntax and clauses of window functions. This lesson focuses on examples. One important point to note about window functions is the placement in the SQL query. You can invoke a window function in the SELECT list statement or in the ORDER BY clause of a query, but not in the WHERE, GROUP BY or HAVING clauses. All of these examples have a window function in the column list.

Here is an example using the table employee.

employee_id full_name department salary
100 Mary Johns SALES 1000.00
101 Sean Moldy IT 1500.00
102 Peter Dugan SALES 2000.00
103 Lilian Penn SALES 1700.00
104 Milton Kowarsky IT 1800.00
105 Mareen Bisset ACCOUNTS 1200.00
106 Airton Graue ACCOUNTS 1100.00

The RANK() function is one of the simplest window functions. It returns the position of any row inside the partition. To obtain the rank salary for each department:


SELECT	
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
  department,
  employee_id,
  full_name,
  salary
FROM employee;

Here is the resulting department employee ranking by salary:

dept_rank department employee_id full_name salary
1 ACCOUNTS 105 Mareen Bisset 1200.00
2 ACCOUNTS 106 Airton Graue 1100.00
1 IT 104 Milton Kowarsky 1800.00
2 IT 101 Sean Moldy 1500.00
1 SALES 102 Peter Dugan 2000.00
2 SALES 103 Lilian Penn 1700.00
3 SALES 100 Mary Johns 1000.00

Think about how to change the query if you want the same report but with all number 1 ranking employees first, then all number 2 employees, and so on. This change is left to the reader as practice.

Another interesting example is a query to obtain a metric for every employee about how close they are from the top salary of their department. Here is the formula to obtain this metric:

employee_salary / max_salary_in_dept

This query orders all employees by the calculated metric, sorted by employees with the lowest salary related to the max salary of their department.


SELECT 
   employee_id,
   full_name,
   department,
   salary,
   salary/MAX(salary) OVER (PARTITION BY department ORDER BY salary DESC)
			AS salary_metric
FROM employee

The result:

employee_id full_name department salary salary_metric
100 Mary Johns SALES 1000.00 0.5
101 Sean Moldy IT 1500.00 0.83
103 Lilian Penn SALES 1700.00 0.85
106 Airton Graue ACCOUNTS 1100.00 0.91
104 Milton Kowarsky IT 1800.00 1.0
105 Mareen Bisset ACCOUNTS 1200.00 1.0
102 Peter Dugan SALES 2000.00 1.0

SQL Data Queries Oriented To Window Functions

These two query examples showcase the real power of window functions. The examples are based on a simple database table train_schedule:

Train_id Station Time
110 San Francisco 10:00:00
110 Redwood City 10:54:00
110 Palo Alto 11:02:00
110 San Jose 12:35:00
120 San Francisco 11:00:00
120 Redwood City 11:54:00
120 Palo Alto 12:04:00
120 San Jose 13:30:00

The first task is to add a new column called time_to_next_station. You can calculate this value by subtracting the station times for pairs of contiguous stations. Calculating this value without window functions can be very complicated. Using the window function LEAD simplifies the task. Here is the query to obtain the time of the next station using the LEAD station:


SELECT
	train_id,
	station,
	time as "station_time",
	lead(time) OVER (PARTITION BY train_id ORDER BY time) - time
		AS time_to_next_station
FROM train_schedule
ORDER BY 1 , 3;

The LEAD() window function is used to obtain the value of a column for the next row in the window. Note the calculation of the metric is done using an expression combining an individual column(time) with a window function(lead). This combination is not valid with aggregate functions.

Here are the results:

train_id station time time_to_next_station
110 San Francisco 10:00:00 00:54:00
110 Redwood City 10:54:00 00:08:00
110 Palo Alto 11:02:00 01:33:00
110 San Jose 12:35:00
120 San Francisco 11:00:00 00:54:00
120 Redwood City 11:54:00 00:10:00
120 Palo Alto 12:04:00 01:26:00
120 San Jose 13:30:00

In this example, you add a new elapsed_travel_time column representing the elapsed time of the trip until the current station. You use the MIN() window function to obtain the starting time of the trip and subtract the current station time.


SELECT 	
	train_id,
	station,
	time as "station_time",
	time - min(time) OVER (PARTITION BY train_id ORDER BY time) 	
								AS elapsed_travel_time,
	lead(time) OVER (PARTITION BY train_id ORDER BY time) - time
								AS time_to_next_station
FROM train_schedule;

Here is the result:

train_id station time elapsed_travel_time time_to_next_station
110 San Francisco 10:00:00 00:00:00 00:54:00
110 Redwood City 10:54:00 00:54:00 00:08:00
110 Palo Alto 11:02:00 01:02:00 01:33:00
110 San Jose 12:35:00 02:35:00
120 San Francisco 11:00:00 00:00:00 00:54:00
120 Redwood City 11:54:00 00:54:00 00:10:00
120 Palo Alto 12:04:00 01:04:00 01:26:00
120 San Jose 13:30:00 02:30:00

Closing Words

Window functions are one of the least known features of the SQL language, though among the most powerful and flexible. This article touches on the basics of window functions. There are clauses, such as PARTITION BY and WINDOW FRAME, and topics to continue to explore. Keep going, learn SQL and increase your skills!