WRITE_API_TIMELINE view
The INFORMATION_SCHEMA.WRITE_API_TIMELINE
view contains per minute
aggregated BigQuery Storage Write API ingestion statistics for the current project.
You can query the INFORMATION_SCHEMA
Write API views
to retrieve historical and real-time information about data ingestion into
BigQuery that uses the BigQuery Storage Write API. See BigQuery Storage Write API for more information.
Required permission
To query the INFORMATION_SCHEMA.WRITE_API_TIMELINE
view, you need
the bigquery.tables.list
Identity and Access Management (IAM) permission for the
project.
Each of the following predefined IAM roles includes the required permission:
roles/bigquery.user
roles/bigquery.dataViewer
roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.metadataViewer
roles/bigquery.resourceAdmin
roles/bigquery.admin
For more information about BigQuery permissions, see Access control with IAM.
Schema
When you query the INFORMATION_SCHEMA
BigQuery Storage Write API views, the query results contain historical and real-time information about data ingestion into
BigQuery using the BigQuery Storage Write API. Each row in the following views represents statistics for ingestion into a specific table, aggregated over
a one minute interval starting at start_timestamp
. Statistics are grouped by stream type and error code, so there will be one row for each stream type and
each encountered error code during the one minute interval for each timestamp
and table combination. Successful requests have the error code set to OK
. If
no data was ingested into a table during a certain time period, then no rows are present for the corresponding timestamps for that table.
The INFORMATION_SCHEMA.WRITE_API_TIMELINE_BY_*
views have the
following schema:
Column name | Data type | Value |
---|---|---|
start_timestamp |
TIMESTAMP |
(Partitioning column) Start timestamp of the 1 minute interval for the aggregated statistics. |
project_id |
STRING |
(Clustering column) ID of the project. |
project_number |
INTEGER |
Number of the project. |
dataset_id |
STRING |
(Clustering column) ID of the dataset. |
table_id |
STRING |
(Clustering column) ID of the table. |
stream_type |
STRING |
The stream type used for the data ingestion with BigQuery Storage Write API. It is supposed to be one of "DEFAULT", "COMMITTED", "BUFFERED", or "PENDING". |
error_code |
STRING |
Error code returned for the requests specified by this row. "OK" for successful requests. |
total_requests |
INTEGER |
Total number of requests within the 1 minute interval. |
total_rows |
INTEGER |
Total number of rows from all requests within the 1 minute interval. |
total_input_bytes |
INTEGER |
Total number of bytes from all rows within the 1 minute interval. |
Data retention
This view contains the BigQuery Storage Write API ingestion history of the past 180 days.
Scope and syntax
Queries against this view must include a region qualifier. If you do not specify a regional qualifier, metadata is retrieved from all regions. The following table explains the region scope for this view:
View name | Resource scope | Region scope |
---|---|---|
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.WRITE_API_TIMELINE[_BY_PROJECT] |
Project level | REGION |
Optional: PROJECT_ID
: the ID of your
Google Cloud project. If not specified, the default project is used.
REGION
: any dataset region name.
For example, `region-us`
.
Example
- To query data in the US multi-region, use
`region-us`.INFORMATION_SCHEMA.WRITE_API_TIMELINE_BY_PROJECT
- To query data in the EU multi-region, use
`region-eu`.INFORMATION_SCHEMA.WRITE_API_TIMELINE_BY_PROJECT
- To query data in the asia-northeast1 region, use
`region-asia-northeast1`.INFORMATION_SCHEMA.WRITE_API_TIMELINE_BY_PROJECT
For a list of available regions, see Dataset locations.
Examples
Example 1: Recent BigQuery Storage Write API ingestion failures
The following example calculates the per minute breakdown of total failed requests for all tables in the project in the last 30 minutes, split by stream type and error code:
SELECT start_timestamp, stream_type, error_code, SUM(total_requests) AS num_failed_requests FROM `region-us`.INFORMATION_SCHEMA.WRITE_API_TIMELINE WHERE error_code != 'OK' AND start_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 30 MINUTE) GROUP BY start_timestamp, stream_type, error_code ORDER BY start_timestamp DESC;
The result is similar to the following:
+---------------------+-------------+------------------+---------------------+ | start_timestamp | stream_type | error_code | num_failed_requests | +---------------------+-------------+------------------+---------------------+ | 2023-02-24 00:25:00 | PENDING | NOT_FOUND | 5 | | 2023-02-24 00:25:00 | DEFAULT | INVALID_ARGUMENT | 1 | | 2023-02-24 00:25:00 | DEFAULT | DEADLINE_EXCEEDED| 4 | | 2023-02-24 00:24:00 | PENDING | INTERNAL | 3 | | 2023-02-24 00:24:00 | DEFAULT | INVALID_ARGUMENT | 1 | | 2023-02-24 00:24:00 | DEFAULT | DEADLINE_EXCEEDED| 2 | +---------------------+-------------+------------------+---------------------+
Example 2: Per minute breakdown for all requests with error codes
The following example calculates a per minute breakdown of successful and failed append requests, split into error code categories. This query could be used to populate a dashboard.
SELECT start_timestamp, SUM(total_requests) AS total_requests, SUM(total_rows) AS total_rows, SUM(total_input_bytes) AS total_input_bytes, SUM( IF( error_code IN ( 'INVALID_ARGUMENT', 'NOT_FOUND', 'CANCELLED', 'RESOURCE_EXHAUSTED', 'ALREADY_EXISTS', 'PERMISSION_DENIED', 'UNAUTHENTICATED', 'FAILED_PRECONDITION', 'OUT_OF_RANGE'), total_requests, 0)) AS user_error, SUM( IF( error_code IN ( 'DEADLINE_EXCEEDED','ABORTED', 'INTERNAL', 'UNAVAILABLE', 'DATA_LOSS', 'UNKNOWN'), total_requests, 0)) AS server_error, SUM(IF(error_code = 'OK', 0, total_requests)) AS total_error, FROM `region-us`.INFORMATION_SCHEMA.WRITE_API_TIMELINE GROUP BY start_timestamp ORDER BY start_timestamp DESC;
The result is similar to the following:
+---------------------+----------------+------------+-------------------+------------+--------------+-------------+ | start_timestamp | total_requests | total_rows | total_input_bytes | user_error | server_error | total_error | +---------------------+----------------+------------+-------------------+------------+--------------+-------------+ | 2020-04-15 22:00:00 | 441854 | 441854 | 23784853118 | 0 | 17 | 17 | | 2020-04-15 21:59:00 | 355627 | 355627 | 26101982742 | 8 | 0 | 13 | | 2020-04-15 21:58:00 | 354603 | 354603 | 26160565341 | 0 | 0 | 0 | | 2020-04-15 21:57:00 | 298823 | 298823 | 23877821442 | 2 | 0 | 2 | +---------------------+----------------+------------+-------------------+------------+--------------+-------------+
Example 3: Tables with the most incoming traffic
The following example returns the BigQuery Storage Write API ingestion statistics for the 10 tables with the most incoming traffic:
SELECT project_id, dataset_id, table_id, SUM(total_rows) AS num_rows, SUM(total_input_bytes) AS num_bytes, SUM(total_requests) AS num_requests FROM `region-us`.INFORMATION_SCHEMA.WRITE_API_TIMELINE_BY_PROJECT GROUP BY project_id, dataset_id, table_id ORDER BY num_bytes DESC LIMIT 10;
The result is similar to the following:
+----------------------+------------+-------------------------------+------------+----------------+--------------+ | project_id | dataset_id | table_id | num_rows | num_bytes | num_requests | +----------------------+------------+-------------------------------+------------+----------------+--------------+ | my-project | dataset1 | table1 | 8016725532 | 73787301876979 | 8016725532 | | my-project | dataset1 | table2 | 26319580 | 34199853725409 | 26319580 | | my-project | dataset2 | table1 | 38355294 | 22879180658120 | 38355294 | | my-project | dataset1 | table3 | 270126906 | 17594235226765 | 270126906 | | my-project | dataset2 | table2 | 95511309 | 17376036299631 | 95511309 | | my-project | dataset2 | table3 | 46500443 | 12834920497777 | 46500443 | | my-project | dataset2 | table4 | 25846270 | 7487917957360 | 25846270 | | my-project | dataset1 | table4 | 18318404 | 5665113765882 | 18318404 | | my-project | dataset1 | table5 | 42829431 | 5343969665771 | 42829431 | | my-project | dataset1 | table6 | 8771021 | 5119004622353 | 8771021 | +----------------------+------------+-------------------------------+------------+----------------+--------------+
Example 4: BigQuery Storage Write API ingestion error ratio for a table
The following example calculates a per-day breakdown of errors for a specific table, split by error code:
SELECT TIMESTAMP_TRUNC(start_timestamp, DAY) as day, project_id, dataset_id, table_id, error_code, SUM(total_rows) AS num_rows, SUM(total_input_bytes) AS num_bytes, SUM(total_requests) AS num_requests FROM `region-us`.INFORMATION_SCHEMA.WRITE_API_TIMELINE_BY_PROJECT WHERE table_id LIKE 'my_table' GROUP BY project_id, dataset_id, table_id, error_code, day ORDER BY day, project_id, dataset_id DESC;
The result is similar to the following:
+---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+ | day | project_id | dataset_id | table_id | error_code | num_rows | num_bytes | num_requests | +---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+ | 2020-04-21 00:00:00 | my_project | my_dataset | my_table | OK | 41 | 252893 | 41 | | 2020-04-20 00:00:00 | my_project | my_dataset | my_table | OK | 2798 | 10688286 | 2798 | | 2020-04-19 00:00:00 | my_project | my_dataset | my_table | OK | 2005 | 7979495 | 2005 | | 2020-04-18 00:00:00 | my_project | my_dataset | my_table | OK | 2054 | 7972378 | 2054 | | 2020-04-17 00:00:00 | my_project | my_dataset | my_table | OK | 2056 | 6978079 | 2056 | | 2020-04-17 00:00:00 | my_project | my_dataset | my_table | INTERNAL | 4 | 10825 | 4 | +---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+