
Incremental Data Models in dbt: Append, Merge, and Snapshot Strategies
dbt incremental models load only new or changed rows, so you don’t rebuild a full table on every run. That makes pipelines faster, lowers warehouse cost, and helps large tables stay fresh. In practice, most teams choose between three patterns: append for immutable data, merge for rows that change, and snapshots for history.
The right choice depends on one simple fact: what your source data does after it lands. If records never change, keep it simple. If they change, or if you need a historical trail, you need stronger logic.
Key Points
- Append adds new rows and leaves old rows untouched.
- Merge updates existing rows when a matching key already exists.
- Snapshots keep past versions so you can see how data changed.
- Incremental loading works best on large tables with predictable change patterns.
- Bad keys and late-arriving records are the fastest way to break an incremental model.
Quick summary: dbt incremental loading speeds up warehouse jobs by processing the delta, not the full table. The best strategy depends on whether your rows stay fixed, change in place, or need historical tracking.
Key takeaway: Pick your strategy based on data behavior first. Performance matters, but correctness matters more.
Quick promise: By the end, you’ll have a simple rule for choosing append, merge, or snapshots in your next dbt model.
What dbt incremental models do and when they make sense
Incremental models in dbt compare incoming source data with the target table and process only the rows that need attention. On a small table, that may not matter much. On a fact table with millions of rows, it can cut runtime and cost in a big way.
The basic idea behind incremental loading
A full refresh rebuilds everything. An incremental run handles only the delta.
If you load daily orders, a full refresh reprocesses every past order each day. An incremental model usually loads only today’s new orders, or today’s changes plus a small lookback window.
Why teams choose incremental models instead of full refreshes
Teams switch when full rebuilds start hurting. Runs take too long, costs climb, and freshness slips.
Still, incremental models aren’t magic. You need a reliable unique key for updates, clear rules for late data, and a source system that exposes change signals such as created or updated timestamps.
How the append strategy works in dbt
Append is the simplest pattern. dbt inserts new rows into the target table and never revisits old ones. Because of that, append is usually the fastest option and the easiest to reason about.
Best cases for append-only tables
Append works well for immutable data. Event logs, clickstream data, application logs, and raw ingestion layers are strong fits because each row describes something that already happened and won’t be edited later.
Where append can cause trouble
Problems start when the source isn’t truly append-only. If an order status changes, an append model won’t correct the old row. If a source retries the same record, you can create duplicates.
Append is fast because it ignores the past. That same strength becomes a weakness when the past changes.
For business tables that get corrected, canceled, or enriched later, append usually creates cleanup work downstream.
Why the merge strategy is the most flexible choice for changing data
Merge handles the messier reality most business data lives in. Rows can arrive for the first time, then change later. A customer updates an email address. An order moves from “pending” to “shipped.” A subscription gets canceled. In those cases, the latest state matters.
How dbt merge uses unique keys and matching logic
The dbt merge strategy uses a unique_key to decide whether an incoming row is new or already exists. If the key matches an existing row, dbt updates it. If no match exists, dbt inserts it.
That key must be stable and truly unique. If it isn’t, merge can overwrite the wrong row or create duplicates.
When merge is better than append
Merge is a better fit when records change over time and you want one current version in the table. It’s common for customer dimensions, order tables, billing records, inventory, and support tickets.
Because merge touches existing rows, it’s heavier than append. On warehouses like Snowflake, BigQuery, Redshift, or Databricks, that extra work is often worth it because the result is cleaner and more trustworthy.
Common merge mistakes to avoid
Most merge issues start with bad source hygiene. Duplicate keys in the incoming data, weak deduping logic, and missing update columns can all produce wrong results.
Performance can also suffer on huge tables. Partitioning, clustering, and tight incremental filters help keep merge jobs from scanning more data than needed.
How snapshots help you track history over time
Snapshots solve a different problem. They don’t simply keep the latest row current. They preserve earlier versions so you can answer historical questions later.
If you need to know what a customer’s plan was last month, or when a product price changed, snapshots are a better choice than merge. That’s why dbt snapshots show up in slowly changing dimensions, audit trails, compliance reporting, and point-in-time analysis.
The two main snapshot patterns, timestamp and check
The timestamp pattern looks at a column such as updated_at. When that value changes, dbt records a new version.
The check pattern compares one or more columns directly. If any tracked value changes, dbt stores a new row version.
When you need snapshots instead of merge
Choose snapshots when history is the point. Merge tells you the current truth. Snapshots tell you how that truth changed over time.
That matters for audit work, customer lifecycle analysis, pricing history, and any report that needs “as of” logic.
What to watch out for with snapshots
Snapshots depend on reliable change detection. If the source fails to update its timestamp, or if you track the wrong columns, you can miss history.
Storage also grows over time. That’s expected, but you should plan for it.
How to choose between append, merge, and snapshot strategies
Start with data behavior, not syntax. This table makes the choice easier:
| Strategy | Best for | Needs unique key | Keeps history |
| Append | Immutable events and logs | No, but deduping may help | No |
| Merge | Rows that can change | Yes | No |
| Snapshot | Tables where history matters | Usually yes | Yes |
The rule of thumb is simple. Use append when rows never change. Use merge when rows change and you want the latest state. Use snapshots when you need past states too.
A simple decision guide for your next model
Ask three questions. Does the row change after it lands? Do you need historical versions? Can you identify each row with a stable key?
Those answers usually decide the strategy in under a minute.
How performance and cost should influence the choice
Append is usually cheapest. Merge costs more because it matches and updates. Snapshots add storage because they keep multiple versions.
Still, the cheapest option isn’t always the right one. A fast model with wrong data is expensive in a different way.
Best practices that keep incremental models reliable
Good incremental models depend on data quality as much as dbt config. Clear assumptions, tested keys, and documented behavior matter more than clever SQL.
Tests and checks that catch bad data early
Use unique and not_null tests on keys that drive merge logic. Add freshness checks on sources when timing matters. If duplicates can enter upstream, test for them before they hit your model.
How to handle late-arriving or corrected records
Late data is common, so plan for it. A small lookback window can reprocess the last few days of data. When corrections happen often, merge is safer than append because it can update existing rows instead of stacking bad history.
Also document the model’s contract. State whether it keeps only the latest record, whether it reprocesses past days, and what assumptions it makes about source updates.
Glossary
Incremental loading: Processing only new or changed rows instead of rebuilding a full table.
Full refresh: Rebuilding the entire target table from scratch.
Unique key: A column or column set that identifies one row.
Append: An incremental pattern that inserts new rows only.
Merge: An incremental pattern that inserts new rows and updates matched rows.
Snapshot: A dbt feature that stores row history over time.
Late-arriving data: Records that show up after the expected load window.
Point-in-time reporting: Reporting based on how data looked at a past moment.
FAQ
Are dbt snapshots the same as incremental models?
No. Snapshots and incremental models solve different problems. Incremental models help you process less data during each run. Snapshots preserve old versions of rows so you can analyze history later.
When should I use a full refresh instead of incremental loading?
Use a full refresh when logic changes, backfills are large, or the table is still small. It’s also useful when source data quality is messy and you need to rebuild cleanly from the start.
Do I need a unique key for every dbt strategy?
No. Append can work without one if the source is truly append-only. Merge usually needs a stable unique key, and snapshots usually work better with one because it ties multiple versions back to the same business entity.
How do I handle late-arriving data in dbt?
Use a lookback window or reprocess recent partitions. If records can change after arrival, merge is usually safer than append because it can correct existing rows instead of adding another copy.
Does the dbt merge strategy work the same on every warehouse?
The core idea is the same, but adapter support and SQL behavior vary across warehouses. Check your adapter docs for details, especially around merge support, partitioning, and performance tuning.
Conclusion
Append works best for immutable data. Merge works best when rows change. Snapshots work best when history matters.
That single decision, matching model logic to data behavior, prevents most dbt incremental mistakes. If you want to practice this on real projects, Data Engineer Academy’s SQL Course is a practical next step, and a strong follow-up topic is dbt tests and source freshness.

