NTILE

The SQL NTILE() function is a ranking function that is used to divide a result set into a specified number of equally-sized groups or “buckets”. This function can be particularly useful when you want to divide a large result set into smaller, more manageable chunks or when you want to perform statistical analysis on groups of data.

Syntax

The syntax for the NTILE() function is as follows:

NTILE ( number_of_buckets ) OVER (
    [PARTITION BY partition_expression, ... ]
    ORDER BY sort_expression [ASC|DESC], ...
)

The number_of_buckets parameter specifies the number of buckets or groups that you want to divide the result set into. This must be a positive integer greater than 0.

The PARTITION BY clause is optional and is used to partition the result set into subsets based on one or more columns. If you omit this clause, the entire result set is treated as a single partition.

The ORDER BY clause is required and is used to specify the order in which the result set should be sorted before it is divided into buckets. You can specify one or more sort expressions, separated by commas. Each sort expression can be followed by either ASC or DESC to specify the sort order.

The NTILE() function returns an integer value representing the bucket number for each row in the result set. The bucket number ranges from 1 to the specified number of buckets.

Example

For example, consider the following table called sales:

idproductamount
1A100
2B200
3A150
4C50
5A75
6B225
7C75
8B175
9A125
10C100

To divide this table into three equally-sized buckets based on the amount column, you could use the following query:

SELECT 
id, product, amount, 
NTILE(3) OVER (ORDER BY amount) AS bucket
FROM sales;

This query would produce the following result:

idproductamountbucket
4C501
5A751
7C751
1A1002
10C1002
3A1502
2B2003
8B1753
9A1253
6B2253

As you can see, the result set has been divided into three buckets, with each bucket containing roughly the same number of rows. The NTILE() function has assigned a bucket number to each row, based on the amount column.