Skip to content

Snapshot with hard_deletes=new_record duplicates deleted data on each run #312

@dubravcik

Description

@dubravcik

Snapshot with hard_deletes = 'new_record' should create a new record with flag dbt_is_deleted='True' when a record is deleted in the source. The new record is created, unfortunately, on each subsequent snapshot run it is created again and again.
In our daily jobs, all deleted records are created as new records every day and it is bloating the snapshot table.

How to reproduce

{% snapshot test_history %}
    {{
        config(
          target_schema='dbo',
          strategy='check',
          check_cols = 'all',
          unique_key='txt',
          hard_deletes = 'new_record'
        )
    }}
SELECT 'A' as txt
UNION ALL
SELECT 'B' as txt
{% endsnapshot %}

Now run snapshot.
Remove the record B

{% snapshot test_history %}
    {{
        config(
          target_schema='dbo',
          strategy='check',
          check_cols = 'all',
          unique_key='txt',
          hard_deletes = 'new_record'
        )
    }}
SELECT 'A' as txt
{% endsnapshot %}

Now run the snapshot twice and check result. Each time you run it, a record is created.

Image

##solution
I think in helpers.sql macro fabric__snapshot_staging_table , the deleted records should be excluded in snapshotted_data CTE e.g. there should be

{%- if strategy.hard_deletes == 'new_record' %}
where {{ columns.dbt_is_deleted }} = 'False'
{% endif -%}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions