SQL Window Function Introduction

MySQL window functions are a powerful and sophisticated feature in SQL that allow for performing calculations across a set of rows that are somehow related to the current row. These functions enable detailed row-by-row analysis without grouping or collapsing the result set as standard aggregate functions typically do.

In this context, the term "dataset" refers to the collection of data over which the function is applied. This can be the entire table in a database, a subset of that table, or even a derived table formed by operations like `JOIN`, `WHERE`, or `ORDER BY`. When a window function is applied, each row is treated as the "current row," and the surrounding rows that are relevant to that calculation form what is known as the "window." This window can shift or remain fixed depending on the function and clause used, but it essentially defines the set of rows the function uses to compute its value for the current row.

Window functions behave similarly to aggregate functions in that they compute values such as sums, counts, or averages. However, unlike aggregate functions that return a single value for a group, window functions return a value for each row—maintaining the row-level granularity of the result.

Types of Window Functions in MySQL

Window functions can be broadly classified into:

1. Aggregate window functions – These are traditional aggregate functions like `SUM()`, `AVG()`, `MAX()`, `MIN()`, and `COUNT()`, used within a window context to provide rolling totals, moving averages, and similar computations across a defined set of rows.

2. Non-aggregate window functions – These do not summarize data but instead provide metadata-like row positions or values from specific positions in the window. Non-aggregate window functions can be further split into:

  • Ranking functions, such as `ROW_NUMBER()`, `RANK()`, and `DENSE_RANK()`, which assign a rank or order to rows based on a specific column or condition.

  • Value functions, like `LEAD()` and `LAG()`, which fetch a value from a row before or after the current row within the window frame.

Syntax for Using Window Functions

To use window functions in MySQL, the following general syntax structure is used:

SELECT column1, column2,

window_function() OVER ([PARTITION BY ...] ORDER BY ...) AS alias

FROM table_name;

The `OVER` clause is essential—it defines the window or the set of rows the function should operate over. You can optionally use `PARTITION BY` to divide the dataset into partitions and apply the function independently to each partition. `ORDER BY` is used within the window to define the order of rows for processing. We will see each of it in action.

Practical Example of using Window Functions

Let us consider a dataset of salaries with records and columns as such - emp_no, salary, from_date, and to_date. As you can see from the picture below, there are multiple entries for one employee number.

However we can't really guess how many records are here from this. Now to assign a row number to each entry, we can make use of ROW_NUMBER() WINDOW FUNCTION as such -

SELECT

emp_no,

salary,

ROW_NUMBER() OVER() AS record_number

FROM salaries;

Here the window specification is None that is we have an empty `OVER` clause. Here row_number() will compute the result by taking into account all rows in table. Then our output will look as below, where every record has got a row number assigned starting from 1 and we named this new column as record_number -

By performing Order By in this newly created Row number column, we can view the total number of records there are in the dataset. There are 9,67,330 records in our dataset -

SELECT

emp_no, salary, ROW_NUMBER() OVER() AS record_number

FROM salaries

ORDER BY record_number DESC

LIMIT 10;

Now that you know the total number of records and have assigned row number to every entry, you want to see the number of entries per employee number, that is you want to group by the employee number and see how many entries are there for every unique employee number. This you can do by using `PARTITION BY` in the `OVER` Clause. Let's see -

SELECT

emp_no, salary,

ROW_NUMBER() OVER() AS record_number,

ROW_NUMBER() OVER(PARTITION BY emp_no) AS record_partition_per_employee

FROM salaries;

Observe the output. The last column named `record_partition_per_employee` gives the Row number partitioned by every unique employee number. Thus 10001 employee number has 17 entries and then the count starts again for 10002 employee number which goes till 6 and so on. -

Now, you may want to view the data ranked as per the Descending order of salary for every unique employee number. So here you will have to partition by every employee and then do the ordering in descending order for every employee and then do the ranking in each of the partitions. This is how you can do it -

SELECT

emp_no, salary,

ROW_NUMBER() OVER() AS record_number,

ROW_NUMBER() OVER(PARTITION BY emp_no) AS record_partition_per_employee,

ROW_NUMBER() OVER(PARTITION BY emp_no ORDER BY salary DESC) AS record_partition_per_employee_order_by_salary

FROM salaries;

Observe the output now. The last column shows the ranking for every employee ordered by salary from highest to lowest for every individual employee. By default the Order By does ordering in ascending order. Thus we will have to specify DESC.

As you can guess by now, we can use multiple window functions in one query and can fetch different results. Also, you have noticed thet window functions does not remove any rows from the output and preserves the detail of the original table.

Significance of use of OVER Clause with Window Functions -

In MySQL, a function is identified and treated as a window function only when it is used in conjunction with the `OVER()` clause. This is a fundamental rule in SQL syntax - without the `OVER()` clause, the function is not considered a window function and will either behave differently or produce a syntax error depending on the function type.

Take `ROW_NUMBER()` as an example. It is inherently a window function, meaning it does not make sense or operate properly outside of a windowing context. SQL knows to treat `ROW_NUMBER()` as a window function specifically because it appears with the `OVER()` clause. This clause defines the window—the subset of rows the function will process for each individual row.

When you write:

ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)

You are telling MySQL:

  • For each `department`, order the rows by `salary` in descending order.

  • Then assign sequential numbers starting from 1, based on this order, for each row in that partition.

This window of rows—determined by the `PARTITION BY` and `ORDER BY` clauses inside `OVER()` - is what makes the function a window function. Without this clause, SQL cannot understand which "window" to apply the function over, and the query will be incomplete or invalid. Additionally, it is critical to alias the output of a window function for interpretation or further use. This is done using `AS` to name the column in the result.

Multiple Window Functions should be used Carefully -

As we know, MySQL allows you to use multiple window functions in a single query, but caution is needed. The key is to ensure that all window functions use the same `PARTITION BY` and `ORDER BY` clause, or else your results may become misleading or inconsistent. This is because different window specifications can generate values based on different groupings or sort orders, leading to logical mismatches in interpretation.

Also, to keep the final output organized and user-friendly, it is often useful to use a regular `ORDER BY` at the end of the query to sort the entire result set meaningfully.

Let's understand a complex example related to the same. The below query retrieves the top 10 highest-paid employees from each department. It ranks employees within each department by salary using ROW_NUMBER(), calculates the average department salary, and filters to include only those with a rank of 10 or less. This is done by wrapping the main query in a subquery and applying a WHERE clause on the computed rank. The final result is sorted by department and rank.

SELECT * FROM (

SELECT

d.dept_name as department, e.first_name, s.salary,

ROW_NUMBER() OVER(PARTITION BY d.dept_name ORDER BY s.salary DESC) AS rank_in_dept,

AVG(s.salary) OVER(PARTITION BY d.dept_name) AS avg_dept_salary

FROM dept_emp de

JOIN employees e ON de.emp_no = e.emp_no

JOIN departments d ON d.dept_no = de.dept_no

JOIN salaries s ON e.emp_no = s.emp_no

) AS ranked

WHERE rank_in_dept <= 10

ORDER BY department, rank_in_dept;

Here, both window functions (`ROW_NUMBER` and `AVG`) share the same `PARTITION BY department`, so the result is coherent. The final `ORDER BY` outside the SELECT clause arranges the entire output.

Hey! You have just witnessed the use of Aggregate window function - AVG(). Also, you have seen how the window function row_number() concept can be utilized to fetch the number of rows we want. Coming up next, we will explore more on Window functions, specifically - RANK(), DENSE_RANK(), LAG(), and LEAD()