The ML.IMPUTER function
This document describes the ML.IMPUTER
function, which lets you replace
NULL
values in a string or numerical expression. You can replace NULL
values
with the most frequently used value for string expressions, or the
mean or
median value for numerical expressions.
When used in the
TRANSFORM
clause,
the values calculated during training for mean, median, and most frequently
used value are automatically used in prediction.
Syntax
ML.IMPUTER(expression, strategy) OVER()
Arguments
ML.IMPUTER
takes the following arguments:
expression
: the numerical orSTRING
expression to impute.strategy
: aSTRING
value that specifies how to replaceNULL
values. Valid values are as follows:mean
: the mean ofexpression
. You can only use this value with numerical expressions.median
: the median ofexpression
. You can only use this value with numerical expressions.most_frequent
: the most frequent value inexpression
.
Output
ML.IMPUTER
returns a FLOAT64
(for numerical expressions) or STRING
(for string expressions) value that contains the replacement for the
NULL
value.
Examples
Example 1
The following example imputes numerical expressions:
SELECT f, ML.IMPUTER(f, 'mean') OVER () AS output FROM UNNEST([NULL, -3, -3, -3, 1, 2, 3, 4, 5]) AS f ORDER BY f;
The output looks similar to the following:
+------+--------+ | f | output | +------+--------+ | NULL | 0.75 | | -3 | -3.0 | | -3 | -3.0 | | -3 | -3.0 | | 1 | 1.0 | | 2 | 2.0 | | 3 | 3.0 | | 4 | 4.0 | | 5 | 5.0 | +------+--------+
Example 2
The following example imputes string expressions:
SELECT f, ML.IMPUTER(f, 'most_frequent') OVER () AS output FROM UNNEST([NULL, NULL, NULL, NULL, 'a', 'a', 'b', 'b', 'c', 'c', 'c']) AS f ORDER BY f;
The output looks similar to the following:
+------+--------+ | f | output | +------+--------+ | NULL | c | | NULL | c | | NULL | c | | NULL | c | | a | a | | a | a | | b | b | | b | b | | c | c | | c | c | | c | c | +------+--------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.