Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action
April 8, 2025

Data migrations: 80% done, 80% left

Migrations stall for two reasons: gnarly models that resist translation and no clear way to prove the new system works like the old one. Without leverage, solving either becomes slow and expensive. The right tools, and the right kind of feedback loop, can turn “almost done” into done.

No items found.
Elliot Gunn
Data migrations: 80% done, 80% left

You know the moment in a data migration when someone says, “We’re basically done, just a few models left to validate.” But then two weeks go by, or maybe two months, and you’re decoding institutional memory, undocumented edge cases, and ghost logic from a system no one fully understands.

The last 20% of a data migration isn’t spread evenly across the project. You’re 80% done, but somehow still have 80% left. Timelines stretch and stakeholders start losing confidence. The migration goes from “almost done” to “somehow still not done.”

We’ve seen this play out across dozens of migration stories. There’s two reasons why this happens: the gnarly model problem, and the validation problem. 

Where timelines break: The bottleneck model(s)

The first phase of a migration is intensive but largely defined. You’re translating SQL, porting stored procedures, migrating schemas, and reworking transformation logic to fit a new platform. 

Sure, it’s time-consuming and a little boring but the logic is mostly out in the open. You might be rewriting hundreds of models or stored procedures, but you can scope it, assign it, and Google it. You already know that most of the trouble will come from known quirks like null handling, casing mismatches, and collation differences, but these issues are solvable with the right experience and a strong bag of tricks. There’s a clear path that you (and your SQL translator) can reason through.

Then you hit that model, the one that’s a deeply layered operational report with 15 intermediate steps and a half-dozen undocumented filters.  Suddenly, you’re stuck. 

The model that looked like a week’s work is now a month of archaeology. It fans out into several new models; each one touches different source systems or teams. And the behavior of the legacy system isn’t just complex but it’s wrong in ways people have come to depend on. 

At this point, the migration isn’t about SQL anymore but about reverse-engineering institutional memory. 

Where confidence breaks: The validation problem

Let’s say you finally rewrite the gnarly model. That’s a win, right? 

Not quite. Now you have to prove that the new system behaves the same as the old one. And that’s where migrations hit the second wall: validation.

The legacy system had quirks: implicit sorting, undocumented filters, timezone assumptions baked into Excel models. You fix one thing, and another goes out of alignment. You rerun, retest, re-compare, write spot checks, stare at sample rows. But it never seems to satisfy the stakeholders, who ask: “Are we sure this matches production?”

This is the second reason migrations stall: you can’t ship what you can’t prove. And proving it, without the right tooling, turns into a slow, manual loop. Worse, it’s a loop with no clear definition of done. Even when the numbers match, confidence may not. If stakeholders don’t trust the outputs, the legacy system never gets turned off. 

Why this last stretch is so expensive

These two problems amplify each other. The gnarly models take forever to translate and rebuild – and also become the hardest to validate. 

This is when “almost done” turns into another quarter on the roadmap. The team isn’t in execution mode, but stuck trying to show that things “work the same way” even when there’s no single source of truth that proves it. You’re burning hours trying to reconstruct logic that was never fully written down in the first place.

Throwing more hours (and people) at the problem rarely helps. You can add more engineers, write more ad hoc tests, and re-run models for the tenth time. But without a structured way to validate data parity, you’re just hoping to stumble into confidence.

What helps is a system that:

  • Confirms that new outputs match legacy behavior
  • Traces mismatches to their root causes
  • Clearly communicates what changed, what didn’t, and why it’s safe to move forward

That’s not something you get from more effort. That’s something you get from applying leverage.

Leverage turns “almost done” into done

What I mean by leverage is collapsing the cost of iteration. The loop (translate, test, validate, fix, repeat) has to be tight, structured, and fast. That’s how you shrink the hardest part of a migration down to something manageable. 

Leverage means designing your tools and workflow to work with the problem, not around it. It means building context into your comparisons, aligning inputs before you diff outputs, and generating proof that scales across hundreds, or thousands, of models. 

Getting help from an expert, or something like one

The Datafold Migration Agent (DMA) was built to support end-to-end migrations, and it’s the final 20%, where ambiguity, edgecases, and missing documentation stall progress where DMA becomes critical. 

It starts by aligning source and target inputs so you’re diffing clean data. Then, it translates transformation logic, whether that is legacy SQL, nested GUI-based pipelines, or tool-specific abstractions, into modern testable code. In many cases, that means unpacking layers of logic before you even SQL. 

DMA can handle it. Once translated, DMA runs row- and column-level diffs between systems to catch mismatches that would have been missed without custom tests for it (and you won’t have complete testing coverage since there’ll always be unknown unknowns). 

If outputs don’t match, DMA refines the translation, reruns the diff, and loops until you’ve reached validated parity. When it’s done, you have a complete, validated record of what changed, what didn’t, and why it’s safe to move forward.

See the feedback loop in action

That’s what real leverage looks like: the ability to prove parity with speed, confidence, and context. You get a tight feedback loop that builds trust without burning engineering hours on every pass. 

Want to see how this kind of loop works in real migrations? Book a demo with our team. We’ll walk you through how DMA handles code translation, validation, and diffing—so “almost done” finally means done.

In this article