Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action
///
September 28, 2022
dbt

Unit Test vs an Integration Test for dbt

Learn what the difference is between a unit test and an integration test for dbt

No items found.
Forrest Bajbek

The data world continues to adopt concepts and terminology from the software world. This article covers what unit tests and integration tests are and how they apply to the data world.

Unit Test

A Unit Test takes a piece of code (usually a function, class, or a combination of the two) and tests the basic functionality of that code.

As a very simple example, consider a function ƒ that adds two numbers. An appropriate unit test would assert that ƒ(1, 2) == 3. But good tests will check edge cases, such as:

  • ƒ(1, NULL) == 1
  • ƒ(1, ‘2’) == NULL
  • ƒ(1, 2.1) == 3.1

Unit tests, in a way, serve as a spec for exactly how your code should behave. Perhaps ƒ(1, ‘2’) should equal 3, casting all inputs to float types if possible. Or perhaps it should throw an error. This may seem trivial, but you can’t predict how even the simplest functions will be used. What you can predict is how it shouldn’t be used.

dbt has a great Unit Testing framework for database tables. With a trivial amount of yaml, you can ensure that a column only contains unique, not null, or a specific list of values.


version: 2
models:
- name: some_table
columns:
- name: some_id
tests:
- unique
- not_null
- name: some_attribute
tests:
- accepted_values:
values: ['some', 'acceptable', 'values']

Integration Test

Integration Tests check whether multiple components of your application work together. Think of it as a higher-level unit test.

A common use case is foreign keys. For example, if the <span class="code">orders</span> table has a foreign key to the <span class="code">customers table</span>, a good integration test asserts that <span class="code">orders.customer_id</span> is a subset of <span class="code">customers.id</span>. Otherwise, the <span class="code">orders</span> table’s referential integrity is compromised (<span class="code">orders.customer_id</span> points to a customer that doesn’t exist).

Again, dbt has excellent tests for ensuring referential integrity. With 3 lines of yaml, you can validate that all records in a child table have a corresponding record in a parent table.


version: 2
models:
- name: orders
columns:
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: id

Another great tool is Great Expectations. Unlike dbt’s yaml-based configuration files, Great Expectations is configured through a Python API, with a new web UI in the works. They have a huge library of Unit and Integration Tests ready to use.