Introducing No-Code CI

Introducing No-Code CI in Datafold: Easily integrate data diffing into your CI workflow—regardless of your data transformation and orchestration tooling—for improved data quality, proactive issue detection, and streamlined PR reviews.

No items found.

Gleb Mezhanskiy, Kira Furuichi, Nick Carchedi

We’re excited to announce our new No-Code CI integration that allows customers to quickly and easily incorporate data diffing into their code review process, regardless of their data stack.

Current Datafold customers are already getting immense value out of leveraging data diffing and Datafold directly in their continuous integration (CI) process. This enables them to find data quality issues proactively in the pull request (PR), preventing data quality issues from entering production environments.

For any given set of code changes in a PR, Datafold must know what versions of a table to actually diff (e.g., staging vs prod, dev vs staging, dev vs prod); for data teams using dbt, Datafold can automatically infer that using the git repository and manifest.json.

For data teams that are using other transformation and orchestration methods like stored procedures, SQL models, Airflow, or other technologies, they can now tell Datafold explicitly what to diff using the new No-Code CI workflow and push those diff results directly in the PR comments. Simply connect your git repository containing data transformation code, and Datafold will do the testing for you.

Now, regardless of the transformation/orchestration tooling your team uses, data engineers can bring data diffing seamlessly into their PR workflow, develop transparency into the impact of their code changes on the data itself, and work with greater velocity and confidence.

Why bring data diffing into the CI process

We’ve written extensively on why we think integrating testing like data diffing in your CI process is one of the best ways to prevent unexpected data quality issues, so we won’t go into too great of depth here.

For data teams that are in the process of maturing their data pipelines or just getting started with CI, data diffing in CI enables teams to do three things:

Govern your data quality testing practices: By integrating data diffing in the CI process, data teams ensure every single PR and new code change undergoes the same testing requirements. No more guessing if every engineer tested their code before deploying to prod.
Understand how code changes impact the data: A data diff shows a value-level comparison of two versions of a table (say a staging and production version of a DIM_ORGS table). With a data diff, data engineers know exactly how the data itself will change with the code change, so they can identify any unexpected regressions before deploying to production. In addition, because Datafold integrates with BI tools like Tableau and Looker, data teams have visibility into the full downstream impact of merging a code change.
Streamline the PR reviewing process: Armed with a data diff, PR reviews can better understand the true implication of code changes without extensive manual testing, increasing the overall velocity of data development and deployment.

How it works

To get started with No-Code CI, users simply need to set up a new CI integration in Datafold and select the "No-Code" option.

Users will be prompted to connect their git repo that contains the data transformation code changes you want tested and the warehouse that contains the data to be diffed.

‍

‍

Now, when a pull request is opened in the specified repo, Datafold will automatically post a comment with a link to add data diffs.

‍

‍

Within the Datafold app, users will specify the tables they want to be compared (e.g. production.analytics.dim_orgs to staging.analytics.dim_orgs). Datafold will automatically run the diff, and use column-level lineage to display a list of any downstream assets (tables, BI dashboards, etc.) that may be impacted by the changes.

Users also have the option to now "Save and Add Preview to PR", which will automatically push the latest data diff summary as a comment to the PR that triggered the workflow.

‍