Data quality is your moat, this is your guide
Fortify your data, fortify your business: Why high-quality data is your ultimate defense.
Table of Contents
Act I: What is data quality?
Act II: Our approach to data quality
Act III: When does data quality matter?
Act IV: Building a culture & strategy for data quality
Act V: The roadmap
Further reading
dbt testing best practices
Building your first CI pipeline
Building an advanced CI pipeline
What is CI, and why you should care
GitHub best practices for analytics engineers
When to consider data diffing
What is data replication?
The practical dbt testing guide
Data Integrity vs Data Quality
What is Data Validation?
During data transformation development & deployment
During data transformation development and deployment, ensuring high data quality is crucial as it sets the foundation for downstream processes. Data practitioners often encounter these core areas around the development and deployment of their data transformation work, often done with a tool like dbt or custom SQL models.
- When bad data reaches production
When bad data reaches production, it can lead to cascading issues throughout the data pipeline. This can occur due to errors in data transformation logic, inconsistencies in source data, or inadequate testing procedures.
- Slow development & deployment
One factor that contributes to slow development and deployment lies in a manual and ad hoc data validation process. Data teams often use manual forms of inspection, whether through data unit tests or custom SQL queries, which are time-consuming, prone to errors, and not standardized across teams. Because you should validate your data every time you change your code, the process or re-running these ad-hoc queries consumes valuable time and introduces development delays.
- Lack of standardized data quality practices
Without standardized coding conventions, documentation practices, and version control procedures around data quality testing, it becomes difficult to maintain consistency and ensure the reliability of data transformation workflows. Also, a lack of governance can lead to inconsistencies in data models and increased technical debt over time.
- Scaling dbt projects to users, models, and tests
Whether you’re the founding data engineer of a startup, or the 100th addition to a large organization’s analytics unit, you’ll still encounter the same challenges around scaling projects for the next phase of growth. Maintaining data quality at scale requires a lot more intentionality and adherence to best practices. As dbt projects grow in complexity to accommodate more users, models, and tests, you’ll experience performance bottlenecks, resource constraints, or difficulties in managing dependencies between different components of the dbt project.
How they are typically solved
There are four ways that practitioners have typically approached data quality testing during dbt project development and deployment, ranging from least mature to most mature.