Introduction to data transformation

This document describes the different ways you can transform data in your BigQuery tables.

For more information about data integrations, see Introduction to loading, transforming, and exporting data.

Methods of transforming data

You can transform data in BigQuery in the following ways:

  • Use data manipulation language (DML) to transform data in your BigQuery tables.
  • Use Dataform to develop, test, control versions, and schedule SQL workflows in BigQuery.
  • Prepare data in BigQuery with context-aware, AI-generated transformation recommendations to cleanse data for analysis.

You can also review the change history of a BigQuery table to examine the transformations made to a table in a specified time range.

Transform data with DML

You can use data manipulation language (DML) to transform data in your BigQuery tables. DML statements are GoogleSQL queries that manipulate existing table data to add or delete rows, modify data in existing rows, or merge data with values from another table. DML transformations are also supported in partitioned tables.

You can run multiple DML statements concurrently, where BigQuery queues several DML statements that transform your data one after the other. BigQuery manages how concurrent DML statements are run, based upon the transformation type.

Transform data with Dataform

Dataform lets you manage data transformation in the extract, load, and transform (ELT) process for data integration. After extracting raw data from source systems and loading it into BigQuery, you can use Dataform to transform it into an organized, tested, and documented suite of tables. While in DML you take an imperative approach by telling BigQuery how exactly to transform your data, in Dataform you write declarative statements where Dataform then determines the transformation needed to achieve that state.

In Dataform, you can develop, test, and version control SQL workflows for data transformation from data source declarations to output tables, views, or materialized views. You can develop SQL workflows with Dataform core or pure JavaScript. Dataform core is an open source meta-language that extends SQL with SQLX and JavaScript. You can use Dataform core to manage dependencies, set up automated data quality testing, and document table or column descriptions within the code.

Dataform stores your SQL workflow code in repositories and uses Git to track file changes. Development workspaces in Dataform let you work on the contents of the repository without affecting the work of others who are working in the same repository. You can connect Dataform repositories to third-party Git providers, including Azure DevOps Services, BitBucket, GitHub, and GitLab.

You can run or schedule SQL workflows with Dataform release configurations and workflow configurations. Alternatively, you can schedule executions either with Cloud Composer, or with Workflows and Cloud Scheduler. During execution, Dataform executes SQL queries in BigQuery in order of object dependencies in your SQL workflow. After execution, you can use your defined tables and views for analysis in BigQuery.

To learn more about creating data transformation SQL workflows in Dataform, see Dataform overview and Overview of Dataform features.

Prepare data in BigQuery

To reduce the toil of data preparation, BigQuery lets you clean data with Gemini-generated transformation suggestions. Data preparation in BigQuery offers the following assistance:

  • Applying transformations and data quality rules
  • Standardizing and enriching data
  • Automating schema mapping

You can validate the results in a preview of your data before executing the changes on all your data.

For more information, see Introduction to BigQuery data preparation.

What's next