The ML.BUCKETIZE function
This document describes the ML.BUCKETIZE
function, which lets you split
a numerical expression into buckets.
Syntax
ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries])
Arguments
ML.BUCKETIZE
takes the following arguments:
numerical_expression
: the numerical expression to bucketize.array_split_points
: an array of numerical values that provide the points at which to split thenumerical_expression
value. Provide the numerical values in order, lowest to highest. The range of possible buckets is determined by the upper and lower boundaries of the array. For example, if thearray_split_points
value is[1, 2, 3, 4]
, then there are five potential buckets that thenumerical_expression
value can be bucketized into.exclude_boundaries
: aBOOL
value that determines whether the upper and lower boundaries fromarray_split_points
are used. IfTRUE
, then the boundary values aren't used to create buckets. For example, if thearray_split_points
value is[1, 2, 3, 4]
andexclude_boundaries
isTRUE
, then there are three potential buckets that thenumerical_expression
value can be bucketized into. The default value isFALSE
.
Output
ML.BUCKETIZE
returns a STRING
value that contains the name of the bucket.
The returned bucket names are in the format of bin_<bucket_index>
, with
bucket_index
starting at 1
.
Example
The following example bucketizes a numerical expression both with and without boundaries:
SELECT ML.BUCKETIZE(2.5, [1, 2, 3]) AS bucket, ML.BUCKETIZE(2.5, [1, 2, 3], TRUE) AS bucket_without_boundaries;
The output looks similar to the following:
+---------+----------------------------+ | bucket | bucket_without_boundaries | +---------+----------------------------+ | bin_3 | bin_2 | +---------+----------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.