
How a leading healthcare staffing firm migrated MySQL to Snowflake 6x faster with Datafold
One of America's largest healthcare staffing firms cut their MySQL-to-Snowflake migration time from 6 months to 1 month using the AI-powered Datafold Migration Agent. They reduced costs by 80% while ensuring perfect data parity across 1,000+ custom queries.
Introduction
One of America's largest healthcare staffing firms cut their MySQL-to-Snowflake migration time from 6 months to 1 month using the AI-powered Datafold Migration Agent. They reduced costs by 80% while ensuring perfect data parity across 1,000+ custom queries.
The challenge
The company maintained a critical reporting system built on MySQL. These reports, which are powered by complex queries that transform data ingested from upstream sources, provided essential credentialing information for healthcare professionals, ensuring they could legally practice in specific locations. The firm needed to migrate these reports to Snowflake without any service disruptions to customers.
The migration presented several complexities:
- Approximately 1,000 custom MySQL queries needed translation
- Many of the queries contained complex joins and subqueries requiring careful translation
- Reports had to produce identical results in both environments to maintain data parity
- Manual validation of each query would require enormous effort and introduce risk
- Some queries exhibited special handling requirements, including non-deterministic behaviors
The company’s reporting ecosystem had evolved organically over the years, making it challenging to identify and categorize all the queries that required migration.
Initially, the data team considered a traditional approach, using contractors for manual SQL translation and validation. They estimated that the contractor approach would take 6-9 months to complete. But given the critical nature of these reports and the resource and timeline required by contractors, this approach simply didn’t align with their business objectives and timelines.
The solution
The company leveraged Datafold's AI-powered Data Migration Agent (DMA), a full-cycle migration solution that addressed their key challenges:
AI-powered SQL translation with continuous refinement
Datafold deployed DMA to translate MySQL queries to Snowflake syntax, with outputs validated using Datafold's proprietary data diff technology. Unlike traditional translators that rely on static code parsing and perform no validation of outputs, Datafold's DMA leverages large language models (LLMs) with a validation-driven feedback loop that intelligently identifies and corrects translation errors. This approach handled complex SQL patterns that simple translation tools would miss, creating a self-improving system that continuously refined the code until achieving parity.
Cross-database diffing for comprehensive validation
DMA’s one-two punch is the combination of translation with Datafold’s industry-leading cross-database data diffing that provides value-level comparison between source and target environments. For the company, this revealed detailed schema comparison, summary statistics, and row-level differences between the outputs of the original queries and the translated Snowflake versions. This automated validation eliminated the need for manually checking innumerable data points across reports and provided clear evidence of migration success.
Specialized handling for edge cases
For queries with special characters or non-deterministic behavior, the Datafold team implemented specialized handling techniques for DMA that addressed MySQL-specific behaviors. This ensured consistent results even when dealing with invisible characters, trailing spaces, and dialect-specific functions.
Batched delivery through pull requests
Datafold delivered translations in batches through pull requests to the company’s GitHub repositories, allowing for continuous integration and testing throughout the project. Continuous delivery allowed for fast feedback loops and quickly built confidence among executive stakeholders that the migration was significantly accelerated.
The results
This leading healthcare staffing firm realized significant benefits from the Datafold implementation:
Timeline reduction from months to weeks
According to the team, Datafold compressed what would have been a months-long project into just weeks. The automated approach allowed the team to parallelize work that would have been sequential if they’d instead used contractors, dramatically accelerating their timeline while maintaining quality.
Data validation at massive scale
The team reported that automated cross-database validation provided a level of confidence that simply wasn't possible with a manual validation process. With Datafold, they could verify that every value matched perfectly between systems, eliminating the possibility of subtle data discrepancies that could impact critical decisions.
Redirecting internal resources to high-impact work
Ultimately, the team was able to reduce their contractor needs by over 60% while simultaneously accelerating the project. This allowed them to redirect budget and internal resources to strategic initiatives rather than migration work.
Looking ahead
Beyond the initial migration, the company is implementing additional Datafold capabilities. The team quickly realized Datafold's capabilities extended far beyond their initial migration use case. Data lineage, automated testing, and observability and monitoring tools are helping them build a more robust data ecosystem for the long term.
- CI/CD Integration: Automatically testing dbt model changes to predict downstream impacts before deployment
- Data Quality Monitoring: Setting up observability for critical metrics with anomaly detection
- Impact Analysis: Using lineage to trace how changes affect downstream Tableau dashboards
What began as a migration tool has evolved into a strategic platform for ensuring data quality across the company’s entire analytics stack.