Teradata to BigQuery Migration: Challenges, Best Practices, and Practical Guide
We’re not going to sugarcoat it: Migrating from Teradata to BigQuery is complex. Don’t worry—our guide will simplify and streamline this process for you. We’ll illuminate the path, highlighting the key contrasts between Teradata’s traditional on-premises system and Google BigQuery's innovative cloud-based platform. We’ll explore these differences in depth, focusing on storage solutions, data processing capabilities, and scalability options.
We’ll also address critical business topics, ensuring you make a smooth transition to the cloud. And no cloud database migration guide would be complete without guidance to secure stakeholder buy-in, optimize your budget, and minimize operational disruptions. You’ll also get practical tips to make this migration easy, giving you everything you need to prove you did it right. (Pre-emptive high five!)
Lastly, Datafold’s powerful AI-driven migration approach makes it faster, more accurate, and more cost-effective than traditional methods. With automated SQL translation and data validation, Datafold minimizes the strain on data teams and eliminates lengthy timelines and high costs typical of in-house or outsourced migrations.
This lets you complete full-cycle migration with precision–and often in a matter of weeks or months, not years–so your team can focus on delivering high-quality data to the business. If you’d like to learn more about the Datafold Migration Agent, please read about it here.
Common Teradata to BigQuery migration challenges
When data engineers move their data from Teradata to BigQuery, they usually run into a few challenges. It takes a lot of detailed planning and careful work to make sure it’s done correctly. So, we’re going to dive into the stuff that usually throws these migrations off track: architecture and SQL dialects. When you understand how they’re different, you’ll have a better shot at a smooth transition. You’ll also see how these differences affect data distribution, query performance, and scalability.
Architecture differences
The architectural differences between Teradata and BigQuery are substantial, starting with the basics of how they work. Cloud-based platforms like BigQuery have very different approaches to storage and computation. Legacy platforms like Teradata have those two things all intertwined like spaghetti.
As you can imagine, going from an on-premise solution to a cloud-based solution is a big deal. Transitioning from physical infrastructure to virtual, service-oriented architecture requires a big shift in how you think about and manage data scalability, storage, and computing resources.
Teradata uses a Massively Parallel Processing (MPP) architecture, which spreads data across multiple nodes for concurrent processing. It closely integrates data storage with computational power, which is important to understand as you migrate to BigQuery's distinct architecture.
Teradata’s architecture explained
Teradata clusters are composed of a network of Access Module Processors (AMPs), which serve as the core drivers of its data processing prowess. Each AMP operates on an individual server instance, bearing a share of both computational and storage duties for the cluster. In this configuration, storage and compute resources are tightly coupled, with specific data segments being stored and processed on the same server. Run out of one or the other and the whole ship goes down. :(
This integrated approach is at the heart of how Teradata's data management and query performance work. Similar to other Massively Parallel Processing (MPP) systems, such as AWS Redshift, Teradata uses a hashing algorithm to evenly distribute data across its AMPs.
However, this method presents several challenges in terms of scalability and performance:
- Reliance on hashing key for query efficiency: The effectiveness of queries heavily relies on their alignment with the hashing key. If joins and filters in a query do not correspond with the hashing key, it can lead to diminished performance due to the need for redistributing data across different AMPs.
- Scaling constraints: Expanding the system’s storage or computational power, necessary for handling more queries, involves adding more nodes. Scaling your database means scaling the entire infrastructure. Aside from being costly, adding nodes also necessitates redistributing data among both new and existing nodes. Although the recent introduction of Teradata VantageCloud offers compute-only clusters, these still rely on a Primary Cluster for storage, which can become a scalability bottleneck.
- Limited elasticity in compute demand: Teradata's architecture isn’t very flexible when it comes to scaling for query processing. Adjusting the cluster’s available storage is a complex task, often leading to either underuse of resources or performance bottlenecks. It gets expensive quickly and isn’t ideal as an enterprise data warehouse.
BigQuery architecture explained
BigQuery’s architecture is cloud native and serverless. It separates compute and storage functions, making it more scalable and flexible, while providing efficient resource usage.
BigQuery’s standout feature is its fully-managed, serverless data warehouse. Unlike Teradata, it automatically scales. As a result, compute resources are dynamically allocated and adjusted based on the workload. During times of high usage, such as business hours when data analysts are running complex queries or generating data analytics reports, BigQuery automatically ramps up to handle the load and scales down during periods of lower activity. Cheap, fast, and easy! Sweet!
Its adaptability is perfectly suited to meet varying data processing requirements. BigQuery's management is user-friendly and accessible (bonus!), offering intuitive controls through an API or a user interface. Even users with limited technical expertise can efficiently manage and interact with their data.
Benefits of migration to BigQuery
Embracing BigQuery as your data warehousing solution has many advantages, like operational efficiencies and enhanced data processing capabilities. Here are just a few big benefits:
- Reduced DevOps effort: Transitioning to BigQuery significantly eases the burden associated with DevOps tasks. And thank goodness, because DevOps people are always super busy. BigQuery simplifies maintenance and optimization. Its serverless nature means spending less time on maintenance and tuning.
- Shift to ELT (Extract, Load, Transform) approach: With Teradata, you typically need to filter and transform your data before you load it (ETL vs. ELT). This is how pipelines were designed pre-cloud because storage and compute were limited and not dynamically scalable. But BigQuery, with its cloud native architecture, supports a more efficient ELT model. Data can be loaded in its raw form and transformed on its way out of BigQuery as needed, giving data analysts more flexibility.
Simplified data management and flexibility: Efficient data management in a Teradata data warehouse requires careful planning of table layouts, indices, and data distribution strategies. BigQuery automates most of this complexity. While it does table partitioning and clustering for performance optimization, it doesn’t require intricate setup and planning that Teradata does. This streamlined approach makes it easier to organize and manage data.
Dialect differences between Teradata SQL and BigQuery SQL
The SQL used by Teradata is pretty different from the SQL used by BigQuery. It’s different enough that they have distinct approaches to basic queries. This makes it difficult to run SQL on one system and expect it to work in the other. You’ll especially see these compatibility issues when transferring your stored procedures, functions, and complex queries. When you understand the differences, you can carefully adapt your queries from one system to the other.
Teradata uses its variant of SQL, known as Teradata SQL. It includes unique syntax and functions tailored to its architecture. For instance, Teradata features specialized join types like MERGE JOIN, optimized for its parallel processing environment. It also offers proprietary functions for data manipulation and analysis and specific syntax for managing its unique indexing and partitioning features.
BigQuery standard SQL, based on ANSI SQL, enables efficient querying of large cloud datasets. It has modern features like array aggregation and user-defined functions (UDF). Tailored for big data analytics, it offers comprehensive functions for structured and semi-structured data like JSON and arrays. It also plays quite nicely with Google's cloud ecosystem. BigQuery standard SQL dialect supports real-time analysis and machine learning, enhancing BigQuery's scalability for complex queries on large datasets.
Traditional translation tools often struggle with these complexities, turning what might seem like a straightforward task into a months- or even years-long process.
Datafold’s Migration Agent simplifies this challenge by automating the SQL conversion process, seamlessly adapting Teradata SQL code—including stored procedures, functions, and queries—for Snowflake. This automation preserves critical business logic while significantly reducing the need for manual intervention, helping teams avoid the lengthy, resource-intensive rewrites that are typical in traditional migrations.
Dialect differences between Teradata and BigQuery: The QUALIFY ROW_NUMBER() Function
A notable distinction between Teradata and BigQuery is in handling queries that select a top number of rows with potential ties. In Teradata, the TOP WITH TIES clause is commonly used, which returns the top N rows from a result set, along with any additional rows that tie for the last place in the top N based on the ordering criteria.
In BigQuery, you can get a similar result using the QUALIFY ROW_NUMBER() function. The function assigns a unique row number to each row in the result set, which can then be used to include additional rows that tie with the Nth row. For example, to return the top 10 orders by OrderAmount, including any orders that tie with the 10th order, the query in BigQuery would be structured differently, using the QUALIFY ROW_NUMBER() function to achieve the same outcome as the TOP WITH TIES clause in Teradata.
Example query: Teradata SQL and BigQuery SQL
Let's explore a typical query used in the Teradata environment to illustrate its SQL capabilities. The following example demonstrates a basic query to select the top 10 orders by amount:
To achieve the same result in BigQuery as the TOP 10 WITH TIES clause in Teradata, you can use a combination of the ROW_NUMBER() window function along with a subquery or Common Table Expression (CTE). Here's how you can construct the equivalent query in BigQuery SQL:
BigQuery achieves a similar result to Teradata's TOP WITH TIES when it uses window functions like RANK(). Using this function assigns a rank to each order based on OrderAmount, with the WHERE clause filtering for ranks up to 10, including ties.
BigQuery's SQL dialect, while comprehensive, differs notably from Teradata's, particularly in Data Definition Language (DDL) features and command syntax. Teradata's DDL intricacies and shortcuts like SEL for SELECT aren't directly mirrored in BigQuery.
Validation as a black box
Validation can become a “black box” in migration projects, where surface-level metrics like row counts may align, but hidden discrepancies in data values go undetected.
Traditional testing methods often miss deeper data inconsistencies, which can lead to critical issues surfacing only after the migration reaches production. Each failed validation triggers time-intensive manual debugging cycles, delaying project completion and straining resources.
Optimizing Schema Management in BigQuery Migrations
Before you move any data to BigQuery, make sure you understand the table schema on both systems. Look at your column names, data types, and every other attribute. It’s possible you’ll need a custom schema for your migration, so we encourage you to look at schema files so you can precisely match the structures between your legacy Teradata system and BigQuery.
This is a process known as schema mapping. Doing this in BigQuery involves two key steps. First, the schema file must precisely reflect the data structures of your enterprise data needs. Second, after you move the data, you need to test and verify that the schema mapping worked as expected.
Test everything!
Leveraging Google Cloud Storage
Consider using Google Cloud Storage as a preliminary step for staging your data. Staging might facilitate a more seamless and efficient data transfer to BigQuery. Google Cloud Storage is robust and scalable, especially when handling large volumes of data.
BigQuery Data Transfer Service
For organizations looking to automate and streamline their data transfer process, the BigQuery Data Transfer Service can be a valuable asset when migrating from Teradata. It simplifies the movement of data from various sources into BigQuery and enhances the efficiency of schema mapping and data integration tasks.
Business challenges in migrating from Teradata to BigQuery
Transitioning to BigQuery presents not only technical but also non-technical business challenges: stakeholder alignment, budget management, and the impact to ongoing operations. You’ll want to focus on minimizing disruption and aligning the migration with business goals. Here are three of the most common business challenges for data migrations:
- The never-ending “last mile”: Even when a migration is near completion, the “last mile” often proves to be the most challenging phase. During stakeholder review, previously unidentified edge cases may emerge, revealing discrepancies that don’t align with business expectations. This phase often becomes a bottleneck, as each round of review and refinement requires time and resources, potentially delaying full migration and user adoption.
- Stakeholder alignment: Getting all your stakeholders to actually agree is as difficult as it is important. Engage people from different departments early in the planning stage. Secure their support and address any concerns or misunderstandings about the transition. Communicate and manage expectations to keep the effort on track.
- Budget considerations: Ah, the classic problem of money. The costs of a data migration go way beyond tools and resources. You’ll need to run both the old and new systems concurrently for a while, which also means potential downtime costs, training for staff, and all the time and energy to keep the business running. Do as much budget planning as you’re able before you do the migration, doing your best to be comprehensive and inclusive of everyone’s requirements (within reason, of course).
- Minimizing downtime: Downtime can have a profound impact on business operations and revenue. You must maintain continuity during the migration. Consider a phased data migration, which involves moving data incrementally, can aid in reducing downtime. We highly recommend a lift-and-shift migration for this and a dozen other reasons.
Test absolutely everything before you turn off the old system. It’s the only way to ensure a seamless transition with minimal disruption.
4 best practices for Teradata to BigQuery migration
The transition from Teradata to BigQuery involves a complex landscape of database migration challenges. To succeed, you need to analyze your current Teradata setup, considering its structure, size, and complexity. Then you can plan a more effective and tailored migration strategy.
The following strategies can help you improve the efficiency and increase your chances of success:
- Have a strategy and prioritize your data assets: Begin by identifying and ranking your critical data assets using column-level lineage analysis. This’ll help you determine which assets are most important. Migrate your data consumption endpoints before your data production pipelines to reduce the burden on the old system during the migration process.
- Lift-and-shift approach: Leverage Datafold’s DMA for the initial lift-and-shift and move your data in its current form without remodeling or refactoring. It’ll simplify the migration, lower risks and resource demands, and speed up the entire process.
- Document your strategy and action plan: Integrate Datafold’s DMA to streamline SQL translation, validation, and documentation for a unified strategy. A comprehensive, well-documented migration strategy and action plan will provide clarity, facilitate stakeholder agreement, and allow for the early identification and correction of potential issues. Good documentation keeps teams aligned throughout the migration journey.
- Automating migrations with Datafold’s DMA: Datafold’s Migration Agent (DMA) simplifies the complex process of migrating SQL and validating data parity across Teradata and Snowflake. It handles SQL dialect translation and includes cross-database data diffing, which expedites validation by comparing source and target data for accuracy.
By automating these elements, DMA can save organizations up to 93% of time typically spent on manual validation and rewriting.
Putting it all together: Teradata to BigQuery migration guide
A successful migration from Teradata to BigQuery requires both technical expertise and solid project management. Using the best practices outlined in our guide will help you to effectively manage this complex transition. Here’s a structured approach to put it all together:
- Detailed migration planning: Develop a specific plan for the Teradata to BigQuery migration, which should include timelines, resource allocation, and stakeholder communication strategies. Begin by thoroughly documenting your current Teradata environment, understanding data dependencies, and setting clear objectives. Use Datafold’s Migration Agent (DMA) to prioritize critical data assets, starting with data consumption points like BI tools and dashboards to enable a smooth transition with minimal disruption.
- Prioritize data consumption endpoints first: Employ column-level lineage to identify and prioritize critical data assets. Start by migrating data consumption points such as BI tools and dashboards to BigQuery. This provides a seamless transition with minimal interruption to data access.
- Implement lift-and-shift strategy: Lift-and-shift data in its current state using a SQL translator embedded within Datafold’s DMA which automates the SQL conversion process. Implement this approach to address the architectural and SQL dialect differences between Teradata and BigQuery.
- Begin by transferring data in its current form using a SQL translator tool like Datafold which can automate the conversion of SQL code from Teradata to BigQuery
- Conduct a data diff to quickly check for parity and ensure that there is a 1-to-1 correspondence between your legacy Teradata database and the new BigQuery database using Datafold’s cross-database data diffing
- Consider a BigQuery migration service as part of the lift-and-shift strategy to simplify and speed up the process (and refactor once you’ve migrated off of Teradata)
- Validate with cross-database diffing: Then, use Datafold’s cross-database diffing to verify data parity, enabling quick and accurate 1-to-1 table validation between the Teradata and Snowflake databases.
- Get stakeholder approval: Once the migration process is stable, get approval from stakeholders. Use data diffing to demonstrate complete parity between the Teradata and BigQuery systems, assuring stakeholders of the migration’s success.
- Deprecate old Teradata assets: In the final phase, issue deprecation notices for the old Teradata assets. Provide stakeholders with parity reports through data diffs to support their transition to the new BigQuery environment. You’ll need to do some hand-holding to make sure every user moves to the new system without any issues. Then you can decommission Teradata without any interruptions.
Conclusion
If you’re ready to learn more about how Datafold is accelerating data teams to work during a migration like one from Teradata to BigQuery, reach out to a data migrations expert and tell us about your migration, tech stack, scale, and concerns. We’re here to help you understand if data diffing is a solution to your migration concerns.
As we said in the beginning of this blog: migrations are hard, potentially years-long projects. Datafold is here to make them as automated as possible, so your team can focus on what matters: providing high quality data to your organization.