Specify nested and repeated columns in table schemas
This page describes how to define a table schema with nested and repeated columns in BigQuery. For an overview of table schemas, see Specifying a schema.
Define nested and repeated columns
To create a column with nested data, set the data type of the column to
RECORD
in the schema. A RECORD
can be accessed as a
STRUCT
type
in GoogleSQL. A STRUCT
is a container of ordered fields.
To create a column with repeated data, set the
mode of the column to REPEATED
in the schema.
A repeated field can be accessed as an
ARRAY
type in
GoogleSQL.
A RECORD
column can have REPEATED
mode, which is represented as an array of
STRUCT
types. Also, a field within a record can be repeated, which is
represented as a STRUCT
that contains an ARRAY
. An array cannot contain
another array directly. For more information, see
Declaring an ARRAY
type.
Limitations
Nested and repeated schemas are subject to the following limitations:
- A schema cannot contain more than 15 levels of nested
RECORD
types. - Columns of type
RECORD
can contain nestedRECORD
types, also called child records. The maximum nested depth limit is 15 levels. This limit is independent of whether theRECORD
s are scalar or array-based (repeated).
RECORD
type is incompatible with UNION
, INTERSECT
, EXCEPT DISTINCT
, and SELECT DISTINCT
.
Example schema
The following example shows sample nested and repeated data. This table contains information about people. It consists of the following fields:
id
first_name
last_name
dob
(date of birth)addresses
(a nested and repeated field)addresses.status
(current or previous)addresses.address
addresses.city
addresses.state
addresses.zip
addresses.numberOfYears
(years at the address)
The JSON data file would look like the following. Notice that the addresses
column contains an array of values (indicated by [ ]
). The multiple addresses
in the array are the repeated data. The multiple fields within each address are
the nested data.
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]} {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}
The schema for this table looks like the following:
[ { "name": "id", "type": "STRING", "mode": "NULLABLE" }, { "name": "first_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "last_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "dob", "type": "DATE", "mode": "NULLABLE" }, { "name": "addresses", "type": "RECORD", "mode": "REPEATED", "fields": [ { "name": "status", "type": "STRING", "mode": "NULLABLE" }, { "name": "address", "type": "STRING", "mode": "NULLABLE" }, { "name": "city", "type": "STRING", "mode": "NULLABLE" }, { "name": "state", "type": "STRING", "mode": "NULLABLE" }, { "name": "zip", "type": "STRING", "mode": "NULLABLE" }, { "name": "numberOfYears", "type": "STRING", "mode": "NULLABLE" } ] } ]
Specifying the nested and repeated columns in the example
To create a new table with the previous nested and repeated columns, select one of the following options:
Console
Specify the nested and repeated addresses
column:
In the Google Cloud console, open the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
In the details panel, click
Create table.On the Create table page, specify the following details:
- For Source, in the Create table from field, select Empty table.
In the Destination section, specify the following fields:
- For Dataset, select the dataset in which you want to create the table.
- For Table, enter the name of the table that you want to create.
For Schema, click
Add field and enter the following table schema:- For Field name, enter
addresses
. - For Type, select RECORD.
- For Mode, choose REPEATED.
Specify the following fields for a nested field:
- In the Field name field, enter
status
. - For Type, choose STRING.
- For Mode, leave the value set to NULLABLE.
Click
Add field to add the following fields:Field name Type Mode address
STRING
NULLABLE
city
STRING
NULLABLE
state
STRING
NULLABLE
zip
STRING
NULLABLE
numberOfYears
STRING
NULLABLE
Alternatively, click Edit as text and specify the schema as a JSON array.
- In the Field name field, enter
- For Field name, enter
SQL
Use the
CREATE TABLE
statement.
Specify the schema using the
column
option:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE TABLE IF NOT EXISTS mydataset.mytable ( id STRING, first_name STRING, last_name STRING, dob DATE, addresses ARRAY< STRUCT< status STRING, address STRING, city STRING, state STRING, zip STRING, numberOfYears STRING>> ) OPTIONS ( description = 'Example name and addresses table');
Click
Run.
For more information about how to run queries, see Run an interactive query.
bq
To specify the nested and repeated addresses
column in a JSON schema file,
use a text editor to create a new file. Paste in the example schema
definition shown above.
After you create your JSON schema file, you can provide it through the bq command-line tool. For more information, see Using a JSON schema file.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Insert data in nested columns in the example
Use the following queries to insert nested data records into tables that have RECORD
data type columns.
Example 1
INSERT INTO mydataset.mytable (id, first_name, last_name, dob, addresses) values ("1","Johnny","Dawn","1969-01-22", ARRAY< STRUCT< status STRING, address STRING, city STRING, state STRING, zip STRING, numberOfYears STRING>> [("current","123 First Avenue","Seattle","WA","11111","1")])
Example 2
INSERT INTO mydataset.mytable (id, first_name, last_name, dob, addresses) values ("1","Johnny","Dawn","1969-01-22",[("current","123 First Avenue","Seattle","WA","11111","1")])
Query nested and repeated columns
To select the value of an ARRAY
at a specific position, use an array subscript
operator.
To access elements in a STRUCT
, use the
dot operator.
The following example selects the first name, last name, and first address
listed in the addresses
field:
SELECT first_name, last_name, addresses[offset(0)].address FROM mydataset.mytable;
The result is the following:
+------------+-----------+------------------+ | first_name | last_name | address | +------------+-----------+------------------+ | John | Doe | 123 First Avenue | | Jane | Doe | 789 Any Avenue | +------------+-----------+------------------+
To extract all elements of an ARRAY
, use the
UNNEST
operator
with a CROSS JOIN
.
The following example selects the first name, last name, address, and state for
all addresses not located in New York:
SELECT first_name, last_name, a.address, a.state FROM mydataset.mytable CROSS JOIN UNNEST(addresses) AS a WHERE a.state != 'NY';
The result is the following:
+------------+-----------+------------------+-------+ | first_name | last_name | address | state | +------------+-----------+------------------+-------+ | John | Doe | 123 First Avenue | WA | | John | Doe | 456 Main Street | OR | | Jane | Doe | 321 Main Street | NJ | +------------+-----------+------------------+-------+
Modify nested and repeated columns
After you add a nested column or a nested and repeated column to a table's schema definition, you can modify the column as you would any other type of column. BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode. For more information, see Modifying table schemas.
When to use nested and repeated columns
BigQuery performs best when your data is denormalized. Rather than preserving a relational schema such as a star or snowflake schema, denormalize your data and take advantage of nested and repeated columns. Nested and repeated columns can maintain relationships without the performance impact of preserving a relational (normalized) schema.
For example, a relational database used to track library books would likely keep
all author information in a separate table. A key such as author_id
would be
used to link the book to the authors.
In BigQuery, you can preserve the relationship between book and author without creating a separate author table. Instead, you create an author column, and you nest fields within it such as the author's first name, last name, date of birth, and so on. If a book has multiple authors, you can make the nested author column repeated.
Suppose you have the following table mydataset.books
:
+------------------+------------+-----------+ | title | author_ids | num_pages | +------------------+------------+-----------+ | Example Book One | [123, 789] | 487 | | Example Book Two | [456] | 89 | +------------------+------------+-----------+
You also have the following table, mydataset.authors
, with complete
information for each author ID:
+-----------+-------------+---------------+ | author_id | author_name | date_of_birth | +-----------+-------------+---------------+ | 123 | Alex | 01-01-1960 | | 456 | Rosario | 01-01-1970 | | 789 | Kim | 01-01-1980 | +-----------+-------------+---------------+
If the tables are large, it might be resource intensive to join them regularly. Depending on your situation, it might be beneficial to create a single table that contains all the information:
CREATE TABLE mydataset.denormalized_books( title STRING, authors ARRAY<STRUCT<id INT64, name STRING, date_of_birth STRING>>, num_pages INT64) AS ( SELECT title, ARRAY_AGG(STRUCT(author_id, author_name, date_of_birth)) AS authors, ANY_VALUE(num_pages) FROM mydataset.books, UNNEST(author_ids) id JOIN mydataset.authors ON id = author_id GROUP BY title );
The resulting table looks like the following:
+------------------+-------------------------------+-----------+ | title | authors | num_pages | +------------------+-------------------------------+-----------+ | Example Book One | [{123, Alex, 01-01-1960}, | 487 | | | {789, Kim, 01-01-1980}] | | | Example Book Two | [{456, Rosario, 01-01-1970}] | 89 | +------------------+-------------------------------+-----------+
BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON files, Avro files, Firestore export files, and Datastore export files.
Deduplicate duplicate records in a table
The following query uses the row_number()
function to identify duplicate records that have the same values for
last_name
and first_name
in the examples used and sorts them by dob
:
CREATE OR REPLACE TABLE mydataset.mytable AS ( SELECT * except(row_num) FROM ( SELECT *, row_number() over (partition by last_name, first_name order by dob) row_num FROM mydataset.mytable) temp_table WHERE row_num=1 )
Table security
To control access to tables in BigQuery, see Introduction to table access controls.
What's next
- To insert and update rows with nested and repeated columns, see Data manipulation language syntax.