The ML.BUCKETIZE function

This document describes the ML.BUCKETIZE function, which lets you split a numerical expression into buckets.

Syntax

ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries])

Arguments

ML.BUCKETIZE takes the following arguments:

  • numerical_expression: the numerical expression to bucketize.
  • array_split_points: an array of numerical values that provide the points at which to split the numerical_expression value. Provide the numerical values in order, lowest to highest. The range of possible buckets is determined by the upper and lower boundaries of the array. For example, if the array_split_points value is [1, 2, 3, 4], then there are five potential buckets that the numerical_expression value can be bucketized into.
  • exclude_boundaries: a BOOL value that determines whether the upper and lower boundaries from array_split_points are used. If TRUE, then the boundary values aren't used to create buckets. For example, if the array_split_points value is [1, 2, 3, 4] and exclude_boundaries is TRUE, then there are three potential buckets that the numerical_expression value can be bucketized into. The default value is FALSE.

Output

ML.BUCKETIZE returns a STRING value that contains the name of the bucket. The returned bucket names are in the format of bin_<bucket_index>, with bucket_index starting at 1.

Example

The following example bucketizes a numerical expression both with and without boundaries:

SELECT
  ML.BUCKETIZE(2.5, [1, 2, 3]) AS bucket,
  ML.BUCKETIZE(2.5, [1, 2, 3], TRUE) AS bucket_without_boundaries;

The output looks similar to the following:

+---------+----------------------------+
| bucket  | bucket_without_boundaries  |
+---------+----------------------------+
| bin_3   | bin_2                      |
+---------+----------------------------+

What's next