NTILE

The SQL NTILE() function is a ranking function that is used to divide a result set into a specified number of equally-sized groups or “buckets”. This function can be particularly useful when you want to divide a large result set into smaller, more manageable chunks or when you want to perform statistical analysis on groups of data.

Syntax

The syntax for the NTILE() function is as follows:

NTILE ( number_of_buckets ) OVER (
    [PARTITION BY partition_expression, ... ]
    ORDER BY sort_expression [ASC|DESC], ...
)

The number_of_buckets parameter specifies the number of buckets or groups that you want to divide the result set into. This must be a positive integer greater than 0.

The PARTITION BY clause is optional and is used to partition the result set into subsets based on one or more columns. If you omit this clause, the entire result set is treated as a single partition.

The ORDER BY clause is required and is used to specify the order in which the result set should be sorted before it is divided into buckets. You can specify one or more sort expressions, separated by commas. Each sort expression can be followed by either ASC or DESC to specify the sort order.

The NTILE() function returns an integer value representing the bucket number for each row in the result set. The bucket number ranges from 1 to the specified number of buckets.

Example

For example, consider the following table called sales:

id product amount
1 A 100
2 B 200
3 A 150
4 C 50
5 A 75
6 B 225
7 C 75
8 B 175
9 A 125
10 C 100

To divide this table into three equally-sized buckets based on the amount column, you could use the following query:

SELECT 
id, product, amount, 
NTILE(3) OVER (ORDER BY amount) AS bucket
FROM sales;

This query would produce the following result:

id product amount bucket
4 C 50 1
5 A 75 1
7 C 75 1
1 A 100 2
10 C 100 2
3 A 150 2
2 B 200 3
8 B 175 3
9 A 125 3
6 B 225 3

As you can see, the result set has been divided into three buckets, with each bucket containing roughly the same number of rows. The NTILE() function has assigned a bucket number to each row, based on the amount column.