Unifying Your Cloud Data Strategy: A Seamless BigQuery to Redshift Migration
Introduction
As organizations develop, they often revisit their data strategies to optimize cost, improve performance and consolidate platforms. Many customers running analytics workloads in a multi-cloud environment are choosing to migrate their data warehouses from services like Google BigQuery to Amazon Redshift - a fully managed, fast and scalable cloud data warehouse tightly integrated with the AWS ecosystem.
In this blog post, we go through the process of migrating your BigQuery workloads to Amazon Redshift using the AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (SCT) and explore the various details, considerations and limitations that teams should be aware of before undertaking this transition.
Use Case: Cloud-to-Cloud Analytics Consolidation
Take for example an organization that initially chose Google BigQuery to house feature datasets and inference results for machine learning models. However, as time went on, they expanded their data estate into AWS to take advantage of services such as Amazon SageMaker, AWS Glue, Amazon S3 and other offerings from AWS. As the data and processes gradually shifted to AWS, the business decided to consolidate its data warehouse to Amazon Redshift to reduce the complexity and cost of data movement across different cloud providers, simplify security and compliance, and enhance performance with additional features such as Redshift Spectrum and Redshift Serverless.
The organization needed a structured, repeatable and consistent method to migrate schema definitions, historical datasets and analytic workflows with minimal downtime. AWS DMS provided the automation and flexibility required for this migration.
Overview of Migration Architecture
The migration approach follows a lift and transform strategy with a few key components.
- AWS Schema Conversion Tool (SCT) - since the data warehouses have heterogeneous engines, a schema conversion tool is necessary to apply the correct data type transformations when migrating the data, ensuring compatibility and optimizing performance across different database platforms. The converted schemas from AWS SCT are suitable not only for Amazon Redshift, but Amazon RDS, Amazon Aurora DB cluster and even databases running on an EC2 instance.
- Data extraction and transfer agent - SCT leverages a data extraction agent in order to transfer data from GCP to AWS. It requires to be installed and configured prior to the execution of the migration task. For more information, see the “Install and Configure Data Extraction Agents” section of the AWS documentation.
- GCS Service Account - in order to connect to BigQuery and other GCP services, a service account, which is an account designed to be used by applications or workloads, is leveraged by AWS SCT.
- Temporary storage (GCS/S3) - upon extraction from BigQuery, the data needs to be temporarily persisted in a GCS bucket while the data extraction agent performs the migration. Similarly, the data needs to be temporarily saved in S3 before it is loaded into Redshift. Both GCS and S3 are necessary as you can’t access data directly in BigQuery and you can’t upload data directly to Amazon Redshift. Proper permissions are needed to allow reading/writing from the buckets.
The core difference between this use case and a standard data replication example such as migrating an on-premise database to an RDS instance, is that the AWS SCT is managing the replication task instead of the DMS service. Normally, if you were to create a replication task, you would define a replication instance, source and target database endpoints and a replication task through the AWS Database Migration Service. Here, the schema conversion tool is the one that manages the task’s lifecycle and performs the data transfer, therefore the process (also called a “local task”) is defined within the tool itself. Both the SCT and the Data extraction agent must be manually configured in advance and running on a local machine or EC2 instance (supports Windows, Redhat Linux or Ubuntu).
The following architecture diagram describes the specificity of this approach:

In-depth step-by-step walkthrough of setting up the migration is available in the “Migrating a BigQuery Project to Amazon Redshift” AWS docs [1].
Considerations and Limitations
There are several considerations to keep in mind when performing this type of data migration.
Firstly, as opposed to a standard replication job, whenever you’re replicating data from BigQuery to Redshift, you need to manually provision the compute resource (whether a local machine on-premise or a VM in the cloud) that is going to be running the replication task through AWS SCP and the data extraction agent. If you’re working with big data, you might be limited by the capabilities of the underlying hardware that is executing the data transfer. In that case, AWS advises to create virtual partitions for your table in AWS SCT and the tool would create a separate subtask for each partition, making the migration process more resilient.
Secondly, since BigQuery and Redshift are based on different engines, there could be queries which are supported by the source database but not by the target one. A typical example is the ARRAY_AGG function which, although present in BigQuery, has limited support in Redshift (under the LIST_AGG definition). This does not only concern the schema conversion process - any future SQL queries executed on the target database need to take into account the specificity and supported operations by that variation of the SQL language. The SCT handles most of the conversions automatically, but some SQL logic needs to be manually rewritten by the data engineers.
Finally, both the local tasks managed by AWS SCT and the replication tasks managed by AWS DMS provide monitoring on the task-level such as the task status bar (estimation of each task’s progress) for the extraction, upload and copy phases. However, the DMS service provides some additional metrics such as number of inserts, updates, deletes on the table-level and some advanced metrics such as “full load throughput rows - source” (measures how fast data is being read from the source database during the initial full load phase in rows per second), “CDC latency - target” (measures the time in seconds between the oldest event that is still waiting to be committed to the target and the current system time of the DMS instance), etc.
Conclusion
Migrating from Google BigQuery to Amazon Redshift allows you to simplify your data architecture, streamline your analytics stack within AWS and take advantage of the tight integration with services like Amazon S3, SageMaker, and AWS Glue.
By leveraging AWS SCT and DMS, organizations can automate large parts of the migration process, reducing the time and risk associated with replatforming cloud-native analytics workloads. Additionally, the tools provide comprehensive assessment reports, helping teams identify potential incompatibility issues between heterogeneous database technologies before they arise, thus enhancing the overall reliability of the migration.
References
[1] “Migrating a BigQuery Project to Amazon Redshift”, AWS docs, (https://docs.aws.amazon.com/dms/latest/sbs/bigquery-redshift.html)
Relevant Success Stories
Book a meeting
Ready to unlock more value from your cloud? Whether you're exploring a migration, optimizing costs, or building with AI—we're here to help. Book a free consultation with our team and let's find the right solution for your goals.