Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action
///
May 8, 2024
dbt, Data Quality Best Practices, Data Testing

dbt Exposures: What are they and how to use them

Learn more about how dbt Exposures extend your dbt DAG to identify downstream assets like BI dashboards and must be manually defined and maintained in YAML.

No items found.
Gleb Mezhanskiy

The output of our work as data practitioners are data products – datasets, dashboards, reports, ML models – no matter how complex or lengthy the pipelines are; it's the final data products that make an impact on the business.

Therefore, it’s essential to understand how data consumers use the data produced by the dbt pipelines—usually through some form of data lineage. While dbt ships with automatic table-level lineage and dbt Cloud provides column-level lineage in the dbt project via dbt docs, it only tracks (automatically) dbt source tables and models, and not how the models are used by the business.

dbt Exposures extend dbt's native docs and allow dbt developers to document end-user data products and their dependencies within the dbt DAG (directed acyclic graph). Exposures can help answer questions, including:

  1. What dbt models are upstream to dashboard X
  2. Who is the owner of the dashboard X
  3. What downstream applications may break if we update model Y

dbt exposures: Example of usage

Exposures are defined in YAML files nested under the "exposures" key.

Code example of a BI dashboard as a dbt Exposure


version: 2

exposures:

- name: report_daily_kpi
label: Daily KPI Dashboard
type: dashboard
maturity: high
url: https://bi.tool/dashboards/100
description: >
Weekly KPI dashboard that execs are looking at daily

depends_on:
- ref('fct_transactions')
- ref('dim_customers')
- source('gsheets', 'goals')
- metric('revenue')

owner:
name: Terry Soulcounter
title: Accountant
email: terry@greatbeyond.com


Exposures in the dbt DAG

Once added to your dbt project via YAML, exposures will appear in your dbt DAG (both in the local docs site as well as the dbt Cloud documentation site) as orange nodes. Like most lineage graphs, you can click on specific nodes and paths for your exposures to clearly understand upstream and downstream dependencies.

dbt exposures: A powerful [and underused] feature
Source: dbt Labs

Exposures best practices

As with any governance feature, it’s important to think about best practices when implementing exposures.

#1: Start small with the business-critical exposures

While exposures are a powerful feature, adding exposure tracking for a mature project with thousands of BI and other dependencies can be overwhelming. When starting using exposures, it’s best to start by adding exposures for the top 10 most important data products. These are commonly known, e.g., an executive KPI dashboard or a reverse-ETL sync into CRM. Starting small allows you to familiarize yourself and the team with the exposures framework and facilitate wider adoption.

#2: Establish team guidelines

Once you’ve added exposure tracking for the essential assets, it may be a good time to establish team guidelines, e.g., "every data/analytics engineer should maintain exposures for the BI assets they own" or "when creating a dashboard for stakeholders, always add an exposure."

Having clear guidelines makes it easy to maintain and enforce team-wide curation of exposures.

#3: Keep exposures healthy

As with dbt tests, it’s essential to keep exposures up-to-date. Once the information in exposures becomes stale, e.g. owners are no longer with the company, the BI tool url is broken, the dashboard was deprecated but is still tracked in exposures, data team members and business users will eventually lose trust in exposures and stop using them, which is the opposite of what we want. Returning to best practice #1 – it’s best to have fewer high-quality exposures that stay up-to-date than hundreds that are stale and untrusted.

dbt exposures limitations

While exposures are a simple and powerful way to document downstream data applications in a dbt project, they have two fundamental limitations.

dbt exposures must be manually created and maintained with YAML, which does not scale effectively

The more widely data is adopted in the organization (good thing), the harder it is for the data team to keep track of all downstream uses of the data they produce. In my data engineering days at Lyft, we used to have over 100 major dashboards across Looker and Tableau and over 10,000 reports in Mode.

Exposures don’t detect potential breakages during code changes

One big reason to have visibility into the downstream data uses is to prevent breaking data products when changing dbt code upstream. While exposures make defined dependencies visible in dbt docs, it still requires someone to go through a (sometimes giant) graph of dependencies to identify potential breaking changes.

Automating exposures with dbt + Datafold

Datafold complements dbt with the automated column-level lineage that implements with all major BI tools. Unlike exposures (that need to be defined manually) and dbt’s own data lineage (that is limited to dbt-project assets), Datafold relies on full semantic parsing of SQL logs from your data warehouse and combining that with metadata from BI tools to form a complete dependency graph that covers the entire data warehouse, including, but not limited to, data models and BI assets.

Choosing the Ideal Data Lineage Tool | Datafold
Lineage in Datafold that goes from source to data app assets, in this case, a reverse ELT sync in Hightouch

Furthermore, integrated in CI, Datafold automatically computes data diffs showing how the data changes when the changes to dbt code are made, and identifies impacted downstream applications such as Looker or Tableau dashboards directly in the pull request.

Datafold automatically adds a comment to your dbt PR indicating data differences between your prod and dev tables, as well as potentially impacted downstream data app dependencies

Conclusion

dbt Exposures, code-defined extensions of your dbt project that can be created to identify downstream data assets (like a reverse ETL sync, BI dashboard, or data science model), are a useful way to extend your default dbt DAG. They must be created and maintained with YAML, and manually defined for each exposure you want to add to your DAG. While they can be an effective way to understand the downstream use of your dbt models, they can be challenging to implement at scale.