Unit Test vs an Integration Test for dbt
Learn what the difference is between a unit test and an integration test for dbt
The data world continues to adopt concepts and terminology from the software world. This article covers what unit tests and integration tests are and how they apply to the data world.
Unit Test
A Unit Test takes a piece of code (usually a function, class, or a combination of the two) and tests the basic functionality of that code.
As a very simple example, consider a function ƒ that adds two numbers. An appropriate unit test would assert that ƒ(1, 2) == 3. But good tests will check edge cases, such as:
- ƒ(1, NULL) == 1
- ƒ(1, ‘2’) == NULL
- ƒ(1, 2.1) == 3.1
Unit tests, in a way, serve as a spec for exactly how your code should behave. Perhaps ƒ(1, ‘2’) should equal 3, casting all inputs to float types if possible. Or perhaps it should throw an error. This may seem trivial, but you can’t predict how even the simplest functions will be used. What you can predict is how it shouldn’t be used.
dbt has a great Unit Testing framework for database tables. With a trivial amount of yaml, you can ensure that a column only contains unique, not null, or a specific list of values.
Integration Test
Integration Tests check whether multiple components of your application work together. Think of it as a higher-level unit test.
A common use case is foreign keys. For example, if the <span class="code">orders</span> table has a foreign key to the <span class="code">customers table</span>, a good integration test asserts that <span class="code">orders.customer_id</span> is a subset of <span class="code">customers.id</span>. Otherwise, the <span class="code">orders</span> table’s referential integrity is compromised (<span class="code">orders.customer_id</span> points to a customer that doesn’t exist).
Again, dbt has excellent tests for ensuring referential integrity. With 3 lines of yaml, you can validate that all records in a child table have a corresponding record in a parent table.
Another great tool is Great Expectations. Unlike dbt’s yaml-based configuration files, Great Expectations is configured through a Python API, with a new web UI in the works. They have a huge library of Unit and Integration Tests ready to use.