dbt model slowly changing dimensions
In today’s fast-paced data environments, historical accuracy is vital for analytical insights. Whether it’s tracking changes in customer attributes or monitoring product updates over time, modeling Slowly Changing Dimensions (SCD) — especially Type 2 — is a cornerstone for building reliable data pipelines. With dbt (data build tool), implementing SCD Type 2 becomes not just efficient but also scalable and maintainable.
This guide takes you deep into how to use dbt to model Slowly Changing Dimensions (SCD Type 2) — from understanding its core concepts to deploying a production-ready model.
Understanding Slowly Changing Dimensions (SCD)
Slowly Changing Dimensions are techniques in data warehousing used to manage and store historical changes in dimension attributes over time. Instead of overwriting existing data, you preserve older records and track changes explicitly.
Types of Slowly Changing Dimensions
- Type 1: Overwrites old data without tracking historical changes.
- Type 2: Preserves historical data by inserting new rows.
- Type 3: Stores a limited history by adding new columns (e.g., previous value).
Why SCD Type 2 Is Essential for Historical Tracking
SCD Type 2 provides a complete timeline of changes, which is invaluable for:
- Customer behavior analysis
- Regulatory compliance (audit trails)
- Personalized marketing and segmentation
How dbt Helps Implement SCD Type 2
dbt excels at transforming raw data into clean, analytics-ready tables. With features like snapshots, macros, and modular SQL, dbt offers a seamless workflow for implementing SCD Type 2.
Preparing Your Environment for SCD Type 2 in dbt
Before you start, make sure you have:
- A working dbt project
- A defined source table with a reliable unique identifier and update timestamp
- Optional:
dbt_utils
package for enhanced transformations
Key Concepts in dbt for SCD Type 2
You’ll primarily work with:
snapshots
: To detect and record changesupdated_at
: Timestamp to track changesis_active
: Flag to indicate the current recordunique_key
: To identify dimension members
Setting Up Source Data with Effective Dates
Your source table should include:
id
(unique identifier)updated_at
(timestamp of last update)- Business attributes (e.g.,
email
,name
,region
)
Creating the Initial Snapshot Table
Use the snapshots/
directory in dbt and define your snapshot logic like so:
{% snapshot scd_customer_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}
select * from {{ source('crm', 'customers') }}
{% endsnapshot %}
Configuring Snapshots for Historical Change Tracking
The snapshot strategy compares the source table to the last known version. When a difference is found in any tracked field, a new row is inserted with updated timestamps.
Understanding dbt’s is_active
and updated_at
Fields
is_active
: Marks whether a record is the most current version.updated_at
: Used to compare whether a change occurred.
You can query only active records like this:
select * from {{ ref('scd_customer_snapshot') }}
where dbt_valid_to is null
Defining Unique Keys for SCD Type 2 Logic
Choosing the right unique_key
is crucial. It should:
- Not change over time
- Be consistently populated
- Uniquely identify a business entity (like
customer_id
)
Implementing a dbt Model for SCD Type 2
Once your snapshot is configured, build a model to reference it:
select
customer_id,
name,
email,
dbt_valid_from as effective_from,
dbt_valid_to as effective_to,
case when dbt_valid_to is null then true else false end as is_current
from {{ ref('scd_customer_snapshot') }}
Using dbt_utils
and Macros to Simplify Logic
The dbt_utils
package includes helpful macros such as surrogate_key
, which can generate unique hashes for tracking changes:
{{ dbt_utils.surrogate_key(['customer_id', 'email', 'region']) }}
This ensures that changes in any tracked column trigger an update.
Applying Business Logic for Updates and Inserts
Customize logic for soft deletes or field-specific changes using conditional expressions. For example:
where status != 'inactive'
Setting up Tests and Validations in dbt
To validate your model:
- Use
not_null
andunique
tests oncustomer_id
- Test
is_current = true
has only one record percustomer_id
Example:
version: 2
models:
- name: customer_dimension
tests:
- dbt_expectations.expect_column_values_to_be_unique:
column: customer_id
Automating Snapshot Runs with dbt Cloud or Scheduler
Schedule your snapshot runs daily/hourly using:
- dbt Cloud jobs
- Airflow
- Prefect
Command:
dbt snapshot
Handling Schema Changes in Dimension Tables
To manage changes in schema:
- Use
on_schema_change = append_new_columns
in config - Audit schema changes in your logs
Monitoring and Troubleshooting SCD Models in dbt
Use:
dbt docs generate
to view model lineage- Logs and
dbt debug
for error tracing - dbt Cloud’s UI for job monitoring
Performance Tips for SCD Type 2 in dbt
- Use partitioning by
updated_at
- Limit snapshot lookback window
- Avoid wide joins; keep snapshots narrow
Common Mistakes to Avoid
- Not setting
unique_key
correctly - Forgetting
updated_at
for timestamp strategy - Overwriting history accidentally with full-refresh
Best Practices for SCD Type 2 Modeling
- Always document
source freshness
- Add metadata columns (
source_system
,load_time
) - Use
surrogate keys
for tracking changes
Real-World Use Case: Customer Dimension Tracking
A SaaS company uses SCD Type 2 to track customer changes:
- Email changes over time
- Subscription level upgrades
- Regional migrations
Their marketing and analytics team gains full visibility into the customer lifecycle.
Result: Data Integrity and Historical Insights
Using dbt for SCD Type 2 provides:
- A trustworthy data trail
- Accurate reporting on historical states
- Compliance-ready audit logs
FAQs
What is SCD Type 2?
SCD Type 2 is a data warehousing technique that preserves history by inserting new records instead of updating existing ones.
How do I implement SCD Type 2 in dbt?
Use dbt’s snapshot
functionality with strategy='timestamp'
and define a unique_key
and updated_at
.
What fields are needed for SCD Type 2?
A stable unique key, an updated_at
timestamp, and business attributes you want to track.
Does dbt support SCD Type 2 out of the box?
Yes, using snapshots and macros, dbt provides built-in support for SCD Type 2.
Can I use macros for SCD Type 2 in dbt?
Absolutely! Use macros like surrogate_key
and custom logic to simplify your implementation.
What’s the difference between SCD Type 1 and Type 2?
Type 1 overwrites old data, while Type 2 adds new rows to preserve history.