CUME_DIST

The SQL CUME_DIST() function is a window function that is used to calculate the cumulative distribution of a set of values in a group. The function returns the cumulative distribution of a value, which is the percentage of rows that have a value less than or equal to the current row value.

Syntax

The syntax for the CUME_DIST() function is as follows:

CUME_DIST() OVER (
    [PARTITION BY partition_expression, ... ]
    ORDER BY sort_expression [ASC | DESC], ...
)

The PARTITION BY clause divides the rows into partitions or groups. The ORDER BY clause specifies the order in which the rows are processed. The sort_expression can be a column name or an expression that evaluates to a column. The [ASC | DESC] specifies the sort order. By default, the sort order is ascending.

The CUME_DIST() function returns a value between 0 and 1, where 0 represents the first row and 1 represents the last row in the result set. For example, if the CUME_DIST() function returns a value of 0.5 for a particular row, it means that 50% of the rows have a value less than or equal to the value in that row.

Example

Here’s an example that demonstrates how to use the CUME_DIST() function:

SELECT 
name, salary, 
CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;

In this example, we are calculating the cumulative distribution of employee salaries in the “employees” table. The CUME_DIST() function is applied to the “salary” column and the results are ordered by ascending salary. The result set includes the employee name, salary, and the cumulative distribution of the salary.

The CUME_DIST() function is particularly useful when you need to analyze the distribution of a particular value in a group. By calculating the cumulative distribution, you can easily identify the percentile of a value in a group and compare it to other values.