INFORMATION_SCHEMA.SHARED_DATASET_USAGE view
The INFORMATION_SCHEMA.SHARED_DATASET_USAGE
view contains the near real-time
metadata about consumption of your shared dataset tables. To get started with
sharing your data across organizations, see Analytics Hub.
Required roles
To get the permission that you need to query the INFORMATION_SCHEMA.SHARED_DATASET_USAGE
view,
ask your administrator to grant you the
BigQuery Data Owner (roles/bigquery.dataOwner
) IAM role on your source project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the
bigquery.datasets.listSharedDatasetUsage
permission,
which is required to
query the INFORMATION_SCHEMA.SHARED_DATASET_USAGE
view.
You might also be able to get this permission with custom roles or other predefined roles.
Schema
The underlying data is partitioned by thejob_start_time
column and clustered by project_id
and dataset_id
.
The INFORMATION_SCHEMA.SHARED_DATASET_USAGE
has the following schema:
Column name | Data type | Value |
---|---|---|
project_id
|
STRING
|
(Clustering column) The ID of the project that contains the shared dataset. |
dataset_id
|
STRING
|
(Clustering column) The ID of the shared dataset. |
table_id
|
STRING
|
The ID of the accessed table. |
data_exchange_id
|
STRING
|
The resource path of the data exchange. |
listing_id
|
STRING
|
The resource path of the listing. |
job_start_time
|
TIMESTAMP
|
(Partitioning column) The start time of this job. |
job_end_time
|
TIMESTAMP
|
The end time of this job. |
job_id
|
STRING
|
The job ID. For example, bquxjob_1234. |
job_project_number
|
INTEGER
|
The number of the project this job belongs to. |
job_location
|
STRING
|
The location of the job. |
linked_project_number
|
INTEGER
|
The project number of the subscriber's project. |
linked_dataset_id
|
STRING
|
The linked dataset ID of the subscriber's dataset. |
subscriber_org_number
|
INTEGER
|
The organization number in which the job ran. This is the organization number of the subscriber. This field is empty for projects that don't have an organization. |
subscriber_org_display_name
|
STRING
|
A human-readable string that refers to the organization in which the job ran. This is the organization number of the subscriber. This field is empty for projects that don't have an organization. |
num_rows_processed
|
INTEGER
|
The number of rows processed from this table by the job. |
total_bytes_processed
|
INTEGER
|
The total bytes processed from this table by the job. |
Data retention
The INFORMATION_SCHEMA.SHARED_DATASET_USAGE
view contains running
jobs and the job history of the past 180 days.
Scope and syntax
Queries against this view must include a region qualifier. If you don't specify a regional qualifier, metadata is retrieved from the US region. The following table explains the region scope for this view:
View Name | Resource scope | Region scope |
---|---|---|
[PROJECT_ID.]INFORMATION_SCHEMA.SHARED_DATASET_USAGE |
Project level | US region |
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE |
Project level | REGION |
Optional: PROJECT_ID
: the ID of your
Google Cloud project. If not specified, the default project is used.
REGION
: any dataset region name.
For example, `region-us`
.
Examples
To run the query against a project other than your default project, add the project ID in the following format:
PROJECT_ID
.region-REGION_NAME
.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
For example, myproject.region-us.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
.
Get the total number of jobs executed on all shared tables
The following example calculates total jobs run by subscribers for a project:
SELECT COUNT(DISTINCT job_id) AS num_jobs FROM `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
The result is similar to the following:
+------------+ | num_jobs | +------------+ | 1000 | +------------+
To check the total jobs run by subscribers, use the WHERE
clause:
- For datasets, use
WHERE dataset_id = "..."
. - For tables, use
WHERE dataset_id = "..." AND table_id = "..."
.
Get the most used table based on the number of rows processed
The following query calculates the most used table based on the number of rows processed by subscribers.
SELECT dataset_id, table_id, SUM(num_rows_processed) AS usage_rows FROM `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE GROUP BY 1, 2 ORDER BY 3 DESC LIMIT 1
The output is similar to the following:
+---------------+-------------+----------------+ | dataset_id | table_id | usage_rows | +---------------+-------------+----------------+ | mydataset | mytable | 15 | +---------------+-------------+----------------+
Find the top organizations that consume your tables
The following query calculates the top subscribers based on the number of bytes
processed from your tables. You can also use the num_rows_processed
column as
a metric.
SELECT subscriber_org_number, ANY_VALUE(subscriber_org_display_name) AS subscriber_org_display_name, SUM(total_bytes_processed) AS usage_bytes FROM `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE GROUP BY 1
The output is similar to the following:
+--------------------------+--------------------------------+----------------+ |subscriber_org_number | subscriber_org_display_name | usage_bytes | +-----------------------------------------------------------+----------------+ | 12345 | myorganization | 15 | +--------------------------+--------------------------------+----------------+
For subscribers without an organization, you can use job_project_number
instead of subscriber_org_number
.
Get usage metrics for your data exchange
If your data exchange and source dataset are in different projects, follow these step to view the usage metrics for your data exchange:
- Find all listings that belong to your data exchange.
- Retrieve the source dataset attached to the listing.
- To view the usage metrics for your data exchange, use the following query:
SELECT * FROM source_project_1.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE WHERE dataset_id='source_dataset_id' AND data_exchange_id="projects/4/locations/us/dataExchanges/x1" UNION ALL SELECT * FROM source_project_2.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE WHERE dataset_id='source_dataset_id' AND data_exchange_id="projects/4/locations/us/dataExchanges/x1"