Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action
December 5, 2024
CI/CD, Data Quality Best Practices, Data Testing

It’s almost 2025 and data teams are still stuck in the ‘Pre-CI’ era

Start small. Build trust. And stop waiting.

No items found.
Elliot Gunn, Alex Mill
It’s almost 2025 and data teams are still stuck in the ‘Pre-CI’ era

At every tradeshow and conference we attended this year, data teams echoed the same frustrations:

“We’re not ready for CI yet.”

“Our pipelines are too small to justify it.”

“Leadership doesn’t see the need–everything seems fine.”

It’s a familiar story. We’ve seen this play out before, but in a different industry. 

In the early days of software engineering, testing and CI weren’t priorities. Developers shipped code, fixed bugs as they popped up, and dealt with the fallout. This worked for a while–until it didn’t. Code broke, users churned, and teams were buried under reactive fixes. Eventually, the cost calculus of ignoring CI shifted enough to change industry standards.

Today, data teams are making those same mistakes. The difference? We already know how this story ends and we don’t need to learn the hard way. 

“We’re not ready” is a false start

Many teams we spoke with genuinely believe they’re not ready for CI. They think it’s something for bigger, more mature organizations. Some leaders dismiss it entirely: “Nothing’s broken, so why invest?” Others told us they were “waiting for the right time” to implement CI. 

But here’s what’s happening underneath the surface: every team that skips CI is already paying for that decision. They’re just paying for it later, in ways they haven’t connected yet. 

Bugs don’t disappear–they show up late, typically in production. Dashboards break, metrics look a little funky, and trust in data erodes. Engineers end up putting out fires instead of building new products.

In the moment, this feels manageable. A team might reason, “Fixing problems as they arise is cheaper than investing in CI right now.” It’s not. 

Waiting comes with a cost. Every broken pipeline, every bad decision, and every wasted hour is time you can’t get back. Small fires add up, and by the time you realize how much damage they’ve caused, you’re stuck in a hole that’s harder—and more expensive—to climb out of.

The hidden costs of skipping CI

Most teams default to thinking about first-order effects because they are immediate, visible, and quantifiable. Bugs show up in production, engineers fix them, and life goes on. 

But skipping CI creates second-, third-, and fourth-order effects that eventually break teams:

Broken trust

When data breaks, so does trust. Executives stop believing what they see on dashboards. Teams start second-guessing reports. Product decisions get made on gut instinct instead of reliable data.

Once trust is lost, it’s hard to get back. Even after the data is fixed, skepticism lingers: “Can we really trust this number?”

“Data avoidance”

As trust deteriorates, teams stop relying on data altogether. Analytics go unused. Critical decisions get made without the data team in the room. Bad data becomes the default, and data-driven culture becomes a slogan without real stakes. 

Reaction and stagnation

For data teams, skipping CI means trading innovation for constant firefighting. Instead of solving interesting problems, engineers spend their days fixing bugs that could have been avoided. Over time, this leads to burnout, frustration, and stalled progress. 

No one wants to do work they’re not proud of.

These effects don’t just happen once, but create a self-reinforcing cycle:

No CI → bad data → more bugs → even less time for CI → repeat

The longer this cycle goes on for, the harder it is to break. 

Why data quality testing can be different

When software teams adopted CI, they had to invent everything from scratch—frameworks, tools, and processes. Tools like Jenkins, GitHub Actions, and automated testing frameworks didn’t exist yet. It took a while to figure out. Developers stumbled on new and better ways of doing things as they went along. 

Data teams don’t have to stumble through the same journey. The tools, frameworks, and principles already exist, and are easily adaptable to data workflows:

  • Version control: Tools like Git allow teams to track and manage changes to analytics code.
  • Continuous integration: Modern tools test changes to data models and pipelines before they hit production.
  • Diffing: Software engineers git diff to track code changes. Data engineers have data diff. Tools like Datafold can directly compare datasets at the value-level, highlighting unintended changes or unforeseen errors that might otherwise go unnoticed.

The tools are also better now. For all the hand-wringing over the Modern Data Stack, more choice also means more competition and increasing accessibility over time. 

CI builds trust, and trust compounds

Trust is a scarce commodity in data, but with even a basic CI setup, trust compounds over time:

  1. Fewer bugs: Testing catches problems early, before they reach production.
  2. Stronger trust: Leaders and teams gain confidence in the accuracy of their data.
  3. Faster iteration: Engineers spend less time fixing problems and more time building.
  4. Better insights: Cleaner data leads to clearer, more actionable insights.
  5. Greater adoption: As trust grows, more teams rely on data to make decisions.

This is how you create a trust flywheel. Each step makes the next one easier, and the momentum builds.

Ditch perfection for progress

CI isn’t about perfection. It’s about progress. The sooner you start, the sooner you’ll build trust in your data, your team, and your decisions.

You don’t need to overhaul your entire workflow overnight. Start small, focus on testing the highest-value pipelines, and go from there. There’s plenty of CI tutorials around; here’s one that gets you started with just a 43-line script in GitHub Actions that any junior data engineer can implement. 

This approach is simple, manageable, and immediately valuable. You’ll see results quickly, and those early wins will build momentum for bigger improvements.

If you’re unsure where to begin, we can help. We’ve worked with teams of all sizes to implement CI successfully—whether they’re setting up their first tests or scaling CI for a growing data platform. Set up a free CI consultation to talk about what CI setup makes sense for your specific data environment and infrastructure

In this article