The ML.GENERATE_TEXT function
This document describes the ML.GENERATE_TEXT
function, which lets you perform
generative natural language tasks by using text from BigQuery
standard tables, or
unstructured data from BigQuery
object tables.
The function works by sending requests to a BigQuery ML
remote model
that represents a hosted Vertex AI model, and then returning the
model's response. The hosted Vertex AI model can be a
built-in Vertex AI text or multimodal model,
or an
Anthropic Claude model.
Several of the ML.GENERATE_TEXT
function's arguments provide the
parameters that shape the hosted Vertex AI model's response.
You can use the ML.GENERATE_TEXT
function to perform tasks such as
classification, sentiment analysis, image captioning, and transcription. For
more information on the types of tasks the Vertex AI
models can perform, see the following topics:
Prompt design can strongly affect the responses returned by the Vertex AI model. For more information, see Design multimodal prompts or Design text prompts.
Input
The input you can provide to ML.GENERATE_TEXT
varies depending on the
Vertex AI model that you use with your remote model.
Input for gemini-1.5
models
When you use the gemini-1.5-flash
or gemini-1.5-pro
model, you can
analyze content from an object table using prompt data you provide as a
function argument, or you can generate text by providing prompt data in a
query or from a column in a standard table. If you are
using content from an object table, it must meet the following requirements:
- Content must be in one of the supported formats that are
described in the Gemini API model
mimeType
parameter. - The supported maximum video length is 2 minutes. If the video is longer than
2 minutes,
ML.GENERATE_TEXT
only returns results for the first 2 minutes.
Input for a gemini-1.0-pro-vision
model
When you use the gemini-1.0-pro-vision
model, you can analyze visual
content from an object table using
prompt data you provide as a function argument. The visual
content must meet the following requirements:
- Content must be in one of the supported image or video formats that are
described in the Gemini API model
mimeType
parameter. - Each piece of content must be no greater than 20 MB.
- The supported maximum video length is 2 minutes. If the video is longer than
2 minutes,
ML.GENERATE_TEXT
only returns results for the first 2 minutes.
Input for Claude models
When you use a Claude model, you can generate text by providing prompt data in a query or from a column in a standard table.
Input for text models
When you use the gemini-1.0-pro
, text-bison
, text-bison-32k
,
or text-unicorn
models, you can generate text by providing prompt data in a
query or from a column in a standard table.
Syntax
ML.GENERATE_TEXT
syntax differs depending on the Vertex AI
model that your remote models targets. Choose the option appropriate for your
use case.
gemini-1.5-flash
Analyze text data from a standard table
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, ground_with_google_search AS ground_with_google_search] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,2.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is1.0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
ground_with_google_search
: aBOOL
value that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_output
and this field are set toTRUE
, an additionalml_generate_text_grounding_result
column is included in the results, providing the sources that the model used to gather additional information. The default isFALSE
.
safety_settings
: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>
value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)
andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold)
. If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVE
safety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE
(Restricted)BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
(Default)BLOCK_ONLY_HIGH
HARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
Analyze unstructured data from an object table
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, TABLEproject_id.dataset.table
, STRUCT( prompt AS prompt [, max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXT
function.
prompt
: aSTRING
value that contains the prompt to use to analyze the visual content. Theprompt
value must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,2.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is1.0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
safety_settings
: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>
value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)
andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold)
. If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVE
safety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE
(Restricted)BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
(Default)BLOCK_ONLY_HIGH
HARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
gemini-1.5-pro
Analyze text data from a standard table
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, ground_with_google_search AS ground_with_google_search] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,2.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is1.0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
ground_with_google_search
: aBOOL
value that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_output
and this field are set toTRUE
, an additionalml_generate_text_grounding_result
column is included in the results, providing the sources that the model used to gather additional information. The default isFALSE
.
safety_settings
: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>
value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)
andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold)
. If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVE
safety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE
(Restricted)BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
(Default)BLOCK_ONLY_HIGH
HARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
To analyze unstructured data from an object table
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, TABLEproject_id.dataset.table
, STRUCT( prompt AS prompt [, max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXT
function.
prompt
: aSTRING
value that contains the prompt to use to analyze the visual content. Theprompt
value must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,2.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is1.0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
safety_settings
: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>
value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)
andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold)
. If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVE
safety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE
(Restricted)BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
(Default)BLOCK_ONLY_HIGH
HARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
gemini-pro
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, ground_with_google_search AS ground_with_google_search] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
ground_with_google_search
: aBOOL
value that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_output
and this field are set toTRUE
, an additionalml_generate_text_grounding_result
column is included in the results, providing the sources that the model used to gather additional information. The default isFALSE
.
safety_settings
: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>
value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)
andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold)
. If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVE
safety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE
(Restricted)BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
(Default)BLOCK_ONLY_HIGH
HARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
gemini-pro-vision
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, TABLEproject_id.dataset.table
, STRUCT( prompt AS prompt [, max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXT
function.
prompt
: aSTRING
value that contains the prompt to use to analyze the visual content. Theprompt
value must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. This value must be in the range[1,2048]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is2048
.
top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is32
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0.4
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
safety_settings
: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>
value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)
andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold)
. If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVE
safety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE
(Restricted)BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
(Default)BLOCK_ONLY_HIGH
HARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
Claude
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, flatten_json_output AS flatten_json_output]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,4096]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.
Details
The model and input table must be in the same region.
text-bison
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,1024]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
text-bison-32
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
text-unicorn
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,1024]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.
top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.
top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
Output
ML.GENERATE_TEXT
returns the input table plus the following columns:
Gemini API models
ml_generate_text_result
: This is the JSON response from theprojects.locations.endpoints.generateContent
call to the model. The generated text is in thetext
element. The safety attributes are in thesafety_ratings
element. This column is returned whenflatten_json_output
isFALSE
.ml_generate_text_llm_result
: aSTRING
value that contains the generated text. This column is returned whenflatten_json_output
isTRUE
.ml_generate_text_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.ml_generate_text_grounding_result
: aSTRING
value that contains a list of the grounding sources that the model used to gather additional information. This column is returned when bothflatten_json_output
andground_with_google_search
areTRUE
.
Claude models
ml_generate_text_result
: This is the JSON response from theprojects.locations.endpoints.rawPredict
call to the model. The generated text is in thecontent
element. This column is returned whenflatten_json_output
isFALSE
.ml_generate_text_llm_result
: aSTRING
value that contains the generated text. This column is returned whenflatten_json_output
isTRUE
.ml_generate_text_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
PaLM API models
ml_generate_text_result
: the JSON response from theprojects.locations.endpoints.predict
call to the model. The generated text is in thecontent
element. The safety attributes are in thesafetyAttributes
element. This column is returned whenflatten_json_output
isFALSE
.ml_generate_text_llm_result
: aSTRING
value that contains the generated text. This column is returned whenflatten_json_output
isTRUE
.ml_generate_text_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Examples
Text analysis
Example 1
This example shows a request that provides a single prompt.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.text_model`, (SELECT 'What is the purpose of dreams?' AS prompt));
Example 2
This example shows a request with the following characteristics:
- Provides prompt data from a table column that's named
prompt
. - Flattens the JSON response into separate columns.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.text_model`, TABLE `mydataset.prompt_table`, STRUCT(TRUE AS flatten_json_output));
Example 3
This example shows a request that provides prompt data from a table
column named question
that is aliased as prompt
.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.text_model`, (SELECT question AS prompt FROM `mydataset.prompt_table`));
Example 4
This example shows a request that concatenates strings and a table column to provide the prompt data.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.text_model`, ( SELECT CONCAT( 'Classify the sentiment of the following text as positive or negative.Text:', input_column, 'Sentiment:') AS prompt FROM `mydataset.input_table`));
Example 5
This example shows a request that excludes model responses that contain
the strings Golf
or football
.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.text_model`, TABLE `mydataset.prompt_table`, STRUCT(TRUE AS flatten_json_output, ['Golf', 'football'] AS stop_sequences));
Example 6
This example shows a request to a Gemini model with the following characteristics:
- Provides prompt data from a table column that's named
prompt
. - Flattens the JSON response into separate columns.
- Retrieves and returns public web data for response grounding.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, TABLE `mydataset.prompt_table`, STRUCT( TRUE AS flatten_json_output, TRUE AS ground_with_google_search));
Example 7
This example shows a request to a Gemini model with the following characteristics:
- Provides prompt data from a table column that's named
prompt
. - Returns a shorter generated text response.
- Filters out unsafe responses by using safety settings.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, TABLE `mydataset.prompt_table`, STRUCT( 75 AS max_output_tokens, [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category, 'BLOCK_LOW_AND_ABOVE' AS threshold), STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));
Visual content analysis
This example analyzes visual content from an object table that's named
dogs
and identifies the breed of dog contained in the content. The content
returned is filtered by the specified safety settings:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.dog_identifier_model`, TABLE `mydataset.dogs` STRUCT( 'What is the breed of the dog?' AS PROMPT, TRUE AS FLATTEN_JSON_OUTPUT, [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category, 'BLOCK_LOW_AND_ABOVE' AS threshold), STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));
Audio content analysis
This example translates and transcribes audio content from an object table
that's named feedback
:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.audio_model`, TABLE `mydataset.feedback`, STRUCT('What is the content of this audio clip, translated into Spanish?' AS PROMPT, TRUE AS FLATTEN_JSON_OUTPUT));
PDF content analysis
This example classifies PDF content from an object table
that's named documents
:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.classify_model` TABLE `mydataset.documents` STRUCT('Classify this document using the following categories: legal, tax-related, real estate' AS PROMPT, TRUE AS FLATTEN_JSON_OUTPUT));
Locations
ML.GENERATE_TEXT
must run in the same
region or multi-region as the remote model that the
function references. You can create remote models over built-in
Vertex AI models in all of the
regions
that support Generative AI APIS, and also in the US
and EU
multi-regions.
You can create remote models over Claude models in all of the
supported regions
for Claude models.
Quotas
See Vertex AI and Cloud AI service functions quotas and limits.
Known issues
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted
error message is returned.
To iterate through inference calls until all rows are successfully processed,
you can use the
BigQuery remote inference SQL scripts
or the
BigQuery remote inference pipeline Dataform package.
To try the BigQuery ML remote inference SQL script, see
Handle quota errors by calling ML.GENERATE_TEXT
iteratively.
What's next
- Try a tutorial on generating text using a public dataset.
- Get step-by-step instructions on how to generate text using your own data.
- Get step-by-step instructions on how to tune an LLM and use it to generate text.
- For more information about using Vertex AI models to generate text and embeddings, see Generative AI overview.
- For more information about using Cloud AI APIs to perform AI tasks, see AI application overview.