INFORMATION_SCHEMA.SHARED_DATASET_USAGE view

The INFORMATION_SCHEMA.SHARED_DATASET_USAGE view contains the near real-time metadata about consumption of your shared dataset tables. To get started with sharing your data across organizations, see Analytics Hub.

Required roles

To get the permission that you need to query the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view, ask your administrator to grant you the BigQuery Data Owner (roles/bigquery.dataOwner) IAM role on your source project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the bigquery.datasets.listSharedDatasetUsage permission, which is required to query the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

You might also be able to get this permission with custom roles or other predefined roles.

Schema

The underlying data is partitioned by the job_start_time column and clustered by project_id and dataset_id.

The INFORMATION_SCHEMA.SHARED_DATASET_USAGE has the following schema:

Column name	Data type	Value
`project_id`	`STRING`	*(Clustering column)* The ID of the project that contains the shared dataset.
`dataset_id`	`STRING`	*(Clustering column)* The ID of the shared dataset.
`table_id`	`STRING`	The ID of the accessed table.
`data_exchange_id`	`STRING`	The resource path of the data exchange.
`listing_id`	`STRING`	The resource path of the listing.
`job_start_time`	`TIMESTAMP`	*(Partitioning column)* The start time of this job.
`job_end_time`	`TIMESTAMP`	The end time of this job.
`job_id`	`STRING`	The job ID. For example, bquxjob_1234.
`job_project_number`	`INTEGER`	The number of the project this job belongs to.
`job_location`	`STRING`	The location of the job.
`linked_project_number`	`INTEGER`	The project number of the subscriber's project.
`linked_dataset_id`	`STRING`	The linked dataset ID of the subscriber's dataset.
`subscriber_org_number`	`INTEGER`	The organization number in which the job ran. This is the organization number of the subscriber. This field is empty for projects that don't have an organization.
`subscriber_org_display_name`	`STRING`	A human-readable string that refers to the organization in which the job ran. This is the organization number of the subscriber. This field is empty for projects that don't have an organization.
`num_rows_processed`	`INTEGER`	The number of rows processed from this table by the job.
`total_bytes_processed`	`INTEGER`	The total bytes processed from this table by the job.

Data retention

The INFORMATION_SCHEMA.SHARED_DATASET_USAGE view contains running jobs and the job history of the past 180 days.

Scope and syntax

Queries against this view must include a region qualifier. If you don't specify a regional qualifier, metadata is retrieved from the US region. The following table explains the region scope for this view:

View Name	Resource scope	Region scope
`[PROJECT_ID.]INFORMATION_SCHEMA.SHARED_DATASET_USAGE`	Project level	US region
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE	Project level	`REGION`

Replace the following:

Optional: PROJECT_ID: the ID of your Google Cloud project. If not specified, the default project is used.

REGION: any dataset region name. For example, `region-us`.

Note: You must use a region qualifier to query INFORMATION_SCHEMA views. The location of the query execution must match the region of the INFORMATION_SCHEMA view.

Examples

To run the query against a project other than your default project, add the project ID in the following format:

PROJECT_ID.region-REGION_NAME.INFORMATION_SCHEMA.SHARED_DATASET_USAGE

For example, myproject.region-us.INFORMATION_SCHEMA.SHARED_DATASET_USAGE.

Get the total number of jobs executed on all shared tables

The following example calculates total jobs run by subscribers for a project:

SELECT
  COUNT(DISTINCT job_id) AS num_jobs
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE

The result is similar to the following:

+------------+
| num_jobs   |
+------------+
| 1000       |
+------------+

To check the total jobs run by subscribers, use the WHERE clause:

For datasets, use WHERE dataset_id = "...".
For tables, use WHERE dataset_id = "..." AND table_id = "...".

Get the most used table based on the number of rows processed

The following query calculates the most used table based on the number of rows processed by subscribers.

SELECT
  dataset_id,
  table_id,
  SUM(num_rows_processed) AS usage_rows
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
GROUP BY
  1,
  2
ORDER BY
  3 DESC
LIMIT
  1

The output is similar to the following:

+---------------+-------------+----------------+
| dataset_id    | table_id      | usage_rows     |
+---------------+-------------+----------------+
| mydataset     | mytable     | 15             |
+---------------+-------------+----------------+

Find the top organizations that consume your tables

The following query calculates the top subscribers based on the number of bytes processed from your tables. You can also use the num_rows_processed column as a metric.

SELECT
  subscriber_org_number,
  ANY_VALUE(subscriber_org_display_name) AS subscriber_org_display_name,
  SUM(total_bytes_processed) AS usage_bytes
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
GROUP BY
  1

The output is similar to the following:

+--------------------------+--------------------------------+----------------+
|subscriber_org_number     | subscriber_org_display_name    | usage_bytes    |
+-----------------------------------------------------------+----------------+
| 12345                    | myorganization                 | 15             |
+--------------------------+--------------------------------+----------------+

For subscribers without an organization, you can use job_project_number instead of subscriber_org_number.

Get usage metrics for your data exchange

If your data exchange and source dataset are in different projects, follow these step to view the usage metrics for your data exchange:

Find all listings that belong to your data exchange.
Retrieve the source dataset attached to the listing.
To view the usage metrics for your data exchange, use the following query:

SELECT
  *
FROM
  source_project_1.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
WHERE
  dataset_id='source_dataset_id'
AND data_exchange_id="projects/4/locations/us/dataExchanges/x1"
UNION ALL
SELECT
  *
FROM
  source_project_2.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
WHERE
  dataset_id='source_dataset_id'
AND data_exchange_id="projects/4/locations/us/dataExchanges/x1"