dbt model slowly changing dimensions

In today’s fast-paced data environments, historical accuracy is vital for analytical insights. Whether it’s tracking changes in customer attributes or monitoring product updates over time, modeling Slowly Changing Dimensions (SCD) — especially Type 2 — is a cornerstone for building reliable data pipelines. With dbt (data build tool), implementing SCD Type 2 becomes not just efficient but also scalable and maintainable.

This guide takes you deep into how to use dbt to model Slowly Changing Dimensions (SCD Type 2) — from understanding its core concepts to deploying a production-ready model.


Understanding Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions are techniques in data warehousing used to manage and store historical changes in dimension attributes over time. Instead of overwriting existing data, you preserve older records and track changes explicitly.

Types of Slowly Changing Dimensions

  • Type 1: Overwrites old data without tracking historical changes.
  • Type 2: Preserves historical data by inserting new rows.
  • Type 3: Stores a limited history by adding new columns (e.g., previous value).

Why SCD Type 2 Is Essential for Historical Tracking

SCD Type 2 provides a complete timeline of changes, which is invaluable for:

  • Customer behavior analysis
  • Regulatory compliance (audit trails)
  • Personalized marketing and segmentation

How dbt Helps Implement SCD Type 2

dbt excels at transforming raw data into clean, analytics-ready tables. With features like snapshots, macros, and modular SQL, dbt offers a seamless workflow for implementing SCD Type 2.


Preparing Your Environment for SCD Type 2 in dbt

Before you start, make sure you have:

  • A working dbt project
  • A defined source table with a reliable unique identifier and update timestamp
  • Optional: dbt_utils package for enhanced transformations

Key Concepts in dbt for SCD Type 2

You’ll primarily work with:

  • snapshots: To detect and record changes
  • updated_at: Timestamp to track changes
  • is_active: Flag to indicate the current record
  • unique_key: To identify dimension members

Setting Up Source Data with Effective Dates

Your source table should include:

  • id (unique identifier)
  • updated_at (timestamp of last update)
  • Business attributes (e.g., email, name, region)

Creating the Initial Snapshot Table

Use the snapshots/ directory in dbt and define your snapshot logic like so:

{% snapshot scd_customer_snapshot %}
{{
  config(
    target_schema='snapshots',
    unique_key='customer_id',
    strategy='timestamp',
    updated_at='updated_at'
  )
}}

select * from {{ source('crm', 'customers') }}

{% endsnapshot %}

Configuring Snapshots for Historical Change Tracking

The snapshot strategy compares the source table to the last known version. When a difference is found in any tracked field, a new row is inserted with updated timestamps.


Understanding dbt’s is_active and updated_at Fields

  • is_active: Marks whether a record is the most current version.
  • updated_at: Used to compare whether a change occurred.

You can query only active records like this:

select * from {{ ref('scd_customer_snapshot') }}
where dbt_valid_to is null

Defining Unique Keys for SCD Type 2 Logic

Choosing the right unique_key is crucial. It should:

  • Not change over time
  • Be consistently populated
  • Uniquely identify a business entity (like customer_id)

Implementing a dbt Model for SCD Type 2

Once your snapshot is configured, build a model to reference it:

select
  customer_id,
  name,
  email,
  dbt_valid_from as effective_from,
  dbt_valid_to as effective_to,
  case when dbt_valid_to is null then true else false end as is_current
from {{ ref('scd_customer_snapshot') }}

Using dbt_utils and Macros to Simplify Logic

The dbt_utils package includes helpful macros such as surrogate_key, which can generate unique hashes for tracking changes:

{{ dbt_utils.surrogate_key(['customer_id', 'email', 'region']) }}

This ensures that changes in any tracked column trigger an update.


Applying Business Logic for Updates and Inserts

Customize logic for soft deletes or field-specific changes using conditional expressions. For example:

where status != 'inactive'

Setting up Tests and Validations in dbt

To validate your model:

  • Use not_null and unique tests on customer_id
  • Test is_current = true has only one record per customer_id

Example:

version: 2
models:
  - name: customer_dimension
    tests:
      - dbt_expectations.expect_column_values_to_be_unique:
          column: customer_id

Automating Snapshot Runs with dbt Cloud or Scheduler

Schedule your snapshot runs daily/hourly using:

  • dbt Cloud jobs
  • Airflow
  • Prefect

Command:

dbt snapshot

Handling Schema Changes in Dimension Tables

To manage changes in schema:

  • Use on_schema_change = append_new_columns in config
  • Audit schema changes in your logs

Monitoring and Troubleshooting SCD Models in dbt

Use:

  • dbt docs generate to view model lineage
  • Logs and dbt debug for error tracing
  • dbt Cloud’s UI for job monitoring

Performance Tips for SCD Type 2 in dbt

  • Use partitioning by updated_at
  • Limit snapshot lookback window
  • Avoid wide joins; keep snapshots narrow

Common Mistakes to Avoid

  • Not setting unique_key correctly
  • Forgetting updated_at for timestamp strategy
  • Overwriting history accidentally with full-refresh

Best Practices for SCD Type 2 Modeling

  • Always document source freshness
  • Add metadata columns (source_system, load_time)
  • Use surrogate keys for tracking changes

Real-World Use Case: Customer Dimension Tracking

A SaaS company uses SCD Type 2 to track customer changes:

  • Email changes over time
  • Subscription level upgrades
  • Regional migrations

Their marketing and analytics team gains full visibility into the customer lifecycle.


Result: Data Integrity and Historical Insights

Using dbt for SCD Type 2 provides:

  • A trustworthy data trail
  • Accurate reporting on historical states
  • Compliance-ready audit logs

FAQs

What is SCD Type 2?
SCD Type 2 is a data warehousing technique that preserves history by inserting new records instead of updating existing ones.

How do I implement SCD Type 2 in dbt?
Use dbt’s snapshot functionality with strategy='timestamp' and define a unique_key and updated_at.

What fields are needed for SCD Type 2?
A stable unique key, an updated_at timestamp, and business attributes you want to track.

Does dbt support SCD Type 2 out of the box?
Yes, using snapshots and macros, dbt provides built-in support for SCD Type 2.

Can I use macros for SCD Type 2 in dbt?
Absolutely! Use macros like surrogate_key and custom logic to simplify your implementation.

What’s the difference between SCD Type 1 and Type 2?
Type 1 overwrites old data, while Type 2 adds new rows to preserve history.