SQL Tutorial Home

SQL Window Function Examples

SQL Tutorial Home

This lesson presents SQL window function examples. SQL Window functions are powerful functions that compute their result based on a set of rows (the window), instead of computing based on a single row. They are similar to SQL aggregate functions.

One important difference between window functions and aggregate functions is when you use aggregate functions with the GROUP BY clause, you lose the individual row. This prevents you from mixing attributes of the individual row with results of the aggregate function. Window functions do not have this restriction, enabling you to mix the results with record level fields. This is really good news for SQL developers.

Using SQL Window Functions

In the SQL Window Functions lesson you learned the syntax and clauses of window functions. This lesson focuses on examples. One important point to note about window functions is the placement in the SQL query. You can invoke a window function in the SELECT list statement or in the ORDER BY clause of a query, but not in the WHERE, GROUP BY or HAVING clauses. All of these examples have a window function in the column list.

Here is an example using the table employee.

employee_id	full_name	department	salary
100	Mary Johns	SALES	1000.00
101	Sean Moldy	IT	1500.00
102	Peter Dugan	SALES	2000.00
103	Lilian Penn	SALES	1700.00
104	Milton Kowarsky	IT	1800.00
105	Mareen Bisset	ACCOUNTS	1200.00
106	Airton Graue	ACCOUNTS	1100.00

The RANK() function is one of the simplest window functions. It returns the position of any row inside the partition. To obtain the rank salary for each department:


SELECT	
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
  department,
  employee_id,
  full_name,
  salary
FROM employee;

Here is the resulting department employee ranking by salary:

dept_rank	department	employee_id	full_name	salary
1	ACCOUNTS	105	Mareen Bisset	1200.00
2	ACCOUNTS	106	Airton Graue	1100.00
1	IT	104	Milton Kowarsky	1800.00
2	IT	101	Sean Moldy	1500.00
1	SALES	102	Peter Dugan	2000.00
2	SALES	103	Lilian Penn	1700.00
3	SALES	100	Mary Johns	1000.00

Think about how to change the query if you want the same report but with all number 1 ranking employees first, then all number 2 employees, and so on. This change is left to the reader as practice.

Another interesting example is a query to obtain a metric for every employee about how close they are from the top salary of their department. Here is the formula to obtain this metric:

employee_salary / max_salary_in_dept

This query orders all employees by the calculated metric, sorted by employees with the lowest salary related to the max salary of their department.


SELECT 
   employee_id,
   full_name,
   department,
   salary,
   salary/MAX(salary) OVER (PARTITION BY department ORDER BY salary DESC)
			AS salary_metric
FROM employee

The result:

employee_id	full_name	department	salary	salary_metric
100	Mary Johns	SALES	1000.00	0.5
101	Sean Moldy	IT	1500.00	0.83
103	Lilian Penn	SALES	1700.00	0.85
106	Airton Graue	ACCOUNTS	1100.00	0.91
104	Milton Kowarsky	IT	1800.00	1.0
105	Mareen Bisset	ACCOUNTS	1200.00	1.0
102	Peter Dugan	SALES	2000.00	1.0

SQL Data Queries Oriented To Window Functions

These two query examples showcase the real power of window functions. The examples are based on a simple database table train_schedule:

Train_id	Station	Time
110	San Francisco	10:00:00
110	Redwood City	10:54:00
110	Palo Alto	11:02:00
110	San Jose	12:35:00
120	San Francisco	11:00:00
120	Redwood City	11:54:00
120	Palo Alto	12:04:00
120	San Jose	13:30:00

The first task is to add a new column called time_to_next_station. You can calculate this value by subtracting the station times for pairs of contiguous stations. Calculating this value without window functions can be very complicated. Using the window function LEAD simplifies the task. Here is the query to obtain the time of the next station using the LEAD station:


SELECT
	train_id,
	station,
	time as "station_time",
	lead(time) OVER (PARTITION BY train_id ORDER BY time) - time
		AS time_to_next_station
FROM train_schedule
ORDER BY 1 , 3;

The LEAD() window function is used to obtain the value of a column for the next row in the window. Note the calculation of the metric is done using an expression combining an individual column(time) with a window function(lead). This combination is not valid with aggregate functions.

Here are the results:

train_id	station	time	time_to_next_station
110	San Francisco	10:00:00	00:54:00
110	Redwood City	10:54:00	00:08:00
110	Palo Alto	11:02:00	01:33:00
110	San Jose	12:35:00	—
120	San Francisco	11:00:00	00:54:00
120	Redwood City	11:54:00	00:10:00
120	Palo Alto	12:04:00	01:26:00
120	San Jose	13:30:00	—

In this example, you add a new elapsed_travel_time column representing the elapsed time of the trip until the current station. You use the MIN() window function to obtain the starting time of the trip and subtract the current station time.


SELECT 	
	train_id,
	station,
	time as "station_time",
	time - min(time) OVER (PARTITION BY train_id ORDER BY time) 	
								AS elapsed_travel_time,
	lead(time) OVER (PARTITION BY train_id ORDER BY time) - time
								AS time_to_next_station
FROM train_schedule;

Here is the result:

train_id	station	time	elapsed_travel_time	time_to_next_station
110	San Francisco	10:00:00	00:00:00	00:54:00
110	Redwood City	10:54:00	00:54:00	00:08:00
110	Palo Alto	11:02:00	01:02:00	01:33:00
110	San Jose	12:35:00	02:35:00	—
120	San Francisco	11:00:00	00:00:00	00:54:00
120	Redwood City	11:54:00	00:54:00	00:10:00
120	Palo Alto	12:04:00	01:04:00	01:26:00
120	San Jose	13:30:00	02:30:00	—

Closing Words

Window functions are one of the least known features of the SQL language, though among the most powerful and flexible. This article touches on the basics of window functions. There are clauses, such as PARTITION BY and WINDOW FRAME, and topics to continue to explore. Keep going, learn SQL and increase your skills!