Introduction to loading data
This document explains how you can load data into BigQuery. The two common approaches to data integration are to extract, load, and transform (ELT) or to extract, transform, load (ETL) data.
For an overview of ELT and ETL approaches, see Introduction to loading, transforming, and exporting data.
Methods of loading or accessing external data
You can use the following methods to load data into BigQuery or access data from BigQuery. Choose one of the following options based on your use case and data sources:
Loading method | Description |
---|---|
Batch load | This method is suitable for batch loading large volumes of data from a
variety of sources. For batch or incremental loading of data from Cloud Storage and other supported data sources, we recommend using the BigQuery Data Transfer Service. With the BigQuery Data Transfer Service, to automate data loading workflows into BigQuery, you can schedule load jobs. You can schedule one-time or batch data transfers at regular intervals (for example, daily or monthly). To ensure that your BigQuery data is always current, you can monitor and log your transfers. For a list of data sources supported by the BigQuery Data Transfer Service, see Supported data sources. |
Streaming load | This method enables loading data in near real time from messaging
systems. To stream data into BigQuery, you can use a BigQuery subscription in Pub/Sub. Pub/Sub can handle high throughput of data loads into BigQuery. It supports real-time data streaming, loading data as it's generated. For more information, see BigQuery subscriptions. |
Change Data Capture (CDC) | This method enables replicating data from databases to
BigQuery in near real time. Datastream can stream data from databases to BigQuery data with near real-time replication. Datastream leverages CDC capabilities to track and replicate row-level changes from your data sources. For a list of data sources supported by Datastream, see Sources. |
Federation to external data sources | This method enables access to external data without loading it
into BigQuery. BigQuery supports accessing select external data sources through Cloud Storage and federated queries. The advantage of this method is that you don't need to load the data before transforming it for subsequent use. You can perform the transformation by running SELECT statements over the external data. |
You can also use the following programmatic methods to load the data:
Loading method | Description |
---|---|
Batch load | You can load data from Cloud Storage or
from a local file by creating a load job. If your source data changes infrequently, or you don't need continuously updated results, load jobs can be a less expensive, less resource-intensive way to load your data into BigQuery. The loaded data can be in Avro, CSV, JSON, ORC, or Parquet format. To create the load job, you can also use the LOAD DATA SQL
statement.Popular open source systems, such as Spark and various ETL partners, also support batch loading data into BigQuery. |
Streaming load | If you must support custom streaming data sources, or preprocess
data before streaming it with large throughput into
BigQuery, use Dataflow or
BigQuery Engine for Apache Flink. For more information about loading from Dataflow to BigQuery, see Write from Dataflow to BigQuery. You can also directly use BigQuery Storage Write API. |
Cloud Data Fusion can help facilitate your ETL process. BigQuery also works with 3rd party partners that transform and load data into BigQuery.
Other ways to acquire data
You can run queries on data without loading it into BigQuery yourself. The following sections describe some alternatives.
The following list describes some of the alternatives:
Run queries on public data
Public datasets are datasets stored in BigQuery and shared with the public. For more information, see BigQuery public datasets.
Run queries on shared data
To run queries on a BigQuery dataset that someone has shared with you, see Introduction to Analytics Hub. Analytics Hub is a data exchange platform that enables data sharing.
Run queries with log data
You can run queries on logs without creating additional load jobs:
Cloud Logging lets you route logs to a BigQuery destination.
Log Analytics lets you run queries that analyze your log data.
What's next
- Learn how to prepare data with Gemini in BigQuery.
- Learn more about transforming data with Dataform.
- Learn more about monitoring load jobs in the administrative jobs explorer and BigQuery metrics.