Slowly Changing Dimensions Type 2
Tips and Tricks

Slowly Changing Dimensions Type 2 with SQL and dbt

A slowly changing dimension type 2 keeps the full history of a dimension row. When a tracked value changes, you close the old row and insert a new one instead of overwriting the past. That matters in analytics because you often need to know what was true on a given date. SQL handles the change logic, and dbt makes the process easier to build, test, and maintain.

Key Points

  • SCD Type 2 stores both current and historical versions of a dimension record.
  • Change detection usually compares incoming rows with the current active row for the same business key.
  • SQL gives you full control over history rules and load order.
  • dbt snapshots reduce repetitive work when the history pattern is standard.
  • Testing matters as much as loading, because duplicate current rows break reporting.

Quick summary: SCD Type 2 keeps old and current versions side by side, so reports can answer what was true at any point in time.

Key takeaway: Good history tracking depends on the right change rules, not only on insert and update logic.

Quick promise: You will leave with a clear sense of when plain SQL is enough and when dbt makes the job easier.

What a slowly changing dimension type 2 actually stores

A dimension table describes business entities such as customers, products, or employees. An SCD Type 2 table stores multiple versions of the same entity over time. One row is current, while older rows stay in place as historical records.

That matters because businesses change. A customer moves, a product changes category, or an employee switches departments. If you overwrite those values, last quarter’s report starts telling today’s story.

The business problem SCD Type 2 solves

Teams often need to answer, “What did we know at the time?” That question shows up in revenue analysis, churn tracking, finance reviews, and audits.

Take a simple case. Customer 42 lived in Boston until March and then moved to Phoenix. January orders should still roll up to Boston. If you overwrite the city in the dimension, every past order joins to Phoenix, and your history becomes wrong.

The columns you usually need for history tracking

Most SCD Type 2 tables rely on a small set of history columns.

ColumnPurpose
surrogate_keyUniquely identifies each version of a row
natural_keyIdentifies the business entity, such as customer_id
effective_dateShows when this version became valid
end_dateShows when this version stopped being valid
is_currentMarks the active version
version_numberTracks first, second, third versions

The rule is simple. One natural key can have many versions, but only one row should be current at a time.

How to build SCD Type 2 logic with SQL

SCD Type 2 in SQL is less about clever syntax and more about the order of operations. You compare incoming rows to active dimension rows, close the old version when something important changes, and insert a new version.

Detecting when a row has changed

Start with staged source data. Then join it to the current dimension row using the natural key, such as customer_id, while filtering to the active record.

Next, compare only the tracked attributes. You can compare them column by column, or you can build a hash from selected fields like city, segment, and status. Hashing helps when many columns matter. Direct comparison is easier to debug.

Leave technical fields out of the rule. Load timestamps, file names, and batch IDs change often, but they should not create new business history.

Closing the old record and inserting the new one

When you find a change, two steps must happen. First, update the old row so it is no longer current. Set its end date and flip the current flag to false.

Second, insert a new row with a fresh surrogate key, a new effective date, and is_current = true. That row becomes the active version for future joins.

If you insert the new version without closing the old one, you can create two active rows for one business key.

SQL example patterns that work well in practice

A reliable SCD Type 2 SQL flow usually looks like this:

  1. Stage and clean the source rows.
  2. Pull the current dimension rows for the same natural keys.
  3. Split records into new, changed, and unchanged groups.
  4. Update changed rows, then insert the new versions.

That pattern works in most warehouses because the logic stays the same even when the syntax shifts. Once the base flow works, you can add late-arriving changes, soft deletes, or custom effective-date rules.

Using dbt to make SCD Type 2 easier to manage

dbt does not change the core SCD Type 2 pattern. It helps you organize the logic, document the model, and test it in a repeatable way. That is useful when a hand-built SQL job starts to grow messy.

When to use dbt snapshots versus a custom model

Snapshots are a good fit when the source has a stable business key and a clear change signal. A custom model is better when the business rules are more involved.

ApproachBest fitMain trade-off
dbt snapshotStandard history tracking with low setup effortLess control over unusual edge cases
custom SQL modelComplex rules, late-arriving data, custom validity logicMore code to maintain and test

If your team needs custom effective dating, multi-source comparisons, or warehouse-specific tuning, plain SQL often gives you more room to work.

A simple dbt snapshot example

A dbt snapshot needs a unique key so it knows which business row to track. Then it needs a rule for deciding when a new version should appear.

The timestamp strategy watches an updated_at field. When that value changes, dbt stores a new version. The check strategy compares chosen columns instead, which is useful when the source does not have a trustworthy update timestamp.

dbt also adds snapshot metadata like dbt_valid_from and dbt_valid_to. Those columns make point-in-time filtering much easier.

Testing and documenting the model in dbt

History tables need tests, because bad history looks believable until a report breaks. At minimum, test for not-null natural keys, unique surrogate keys, and one current row per natural key.

For custom models, add a test for overlapping date ranges. Also document each history column so other analysts know whether end_date is exclusive, inclusive, or null for the active row.

Common SCD Type 2 mistakes and how to avoid them

Most broken history tables fail for a few repeat mistakes. The SQL may look fine, yet a small rule can still damage reporting.

Forgetting to filter only one active record

Your current dimension lookup should return one row per natural key. If it returns two, fact table joins can duplicate rows and inflate totals.

Protect the model with a strong business key, an is_current flag, and a validation test that fails when more than one active row exists.

Using the wrong change rule for slowly changing data

Only track attributes that matter to the business. A plan change or territory change may need history. A batch timestamp should not.

That choice belongs to dimensional modeling, not only to SQL. History should reflect meaningful business change, not system noise.

Breaking point-in-time analysis with bad dates

Effective dates and end dates need one clear rule across the model. Rows should not overlap, and the team should decide whether end dates are exclusive or inclusive.

Time zones can also cause trouble. If one source lands in UTC and another in local time, the same event can appear in the wrong order.

History only works when the date boundaries are clean and consistent.

FAQ

What is the difference between SCD Type 1 and Type 2?

Type 1 overwrites the old value, so only the latest state remains. Type 2 keeps the old row and adds a new row when a tracked value changes. Use Type 1 for fixes you do not need to audit, and Type 2 for changes that matter in reporting.

Should I use dbt snapshots or custom SQL for SCD Type 2?

Use dbt snapshots when the source has a clear key and a reliable change signal. Choose custom SQL when you need late-arriving logic, special validity rules, or tighter control over performance. Many teams start with snapshots and move to custom SQL only when the rules become more complex.

How do I query the current row in an SCD Type 2 table?

Filter for the active version only. That usually means is_current = true or end_date is null, based on your design. Pick one rule and use it everywhere, because mixed patterns create mistakes in joins, dashboards, and downstream models.

How do I query a row as of a past date?

Join on the natural key and filter to the row where the target date falls between the effective date and end date. If current rows use a null end date, handle that case in the filter. Clean date logic is what makes point-in-time analysis reliable.

Which columns should create a new version in an SCD Type 2 model?

Track business attributes that change analysis or grouping, such as city, plan, category, region, or manager. Skip technical fields like load time, source file name, and batch ID. Those values create extra row versions without adding useful business history.

What should I learn next after SCD Type 2?

Study star schemas, fact versus dimension tables, dbt incremental models, and point-in-time joins. If you want guided practice, Data Engineer Academy’s Data Modeling Course is a strong next step. Good follow-up articles include dimensional modeling basics, surrogate keys, and dbt snapshot patterns.

Conclusion

A slowly changing dimension type 2 works because it preserves old truth and current truth at the same time. Use SQL when you need full control over change rules, load order, or edge cases. Use dbt snapshots when the pattern is standard and you want a simpler path to testing and maintenance.

The bigger lesson is simple. Good history leads to better reporting, easier audits, and more trusted analytics.