Skip to content

Orphaned identifiers result in duplicate objects #331

@hancush

Description

@hancush

Related to #295.

Deleting objects does not delete identifiers associated with them. When orphaned identifiers hang out, any subsequent object with the same identifier creates duplicates. Some instances in which this is an issue is if data is removed by accident, or if it's desirable to remove data before a scrape can correct it, e.g., to prevent the spread of erroneous information.

A practical example: We scrape events from the Legistar API and use the unique event ID as an identifier for events. This week, we needed to remove a batch of test events, some with errors, and rely on the scrape to repopulate the events that did not contain errors. This resulted in a duplicate of every correct event that was removed, for each scrape we ran.

Something like hooking into delete signals for the top-level models in python-opencivicdata and removing any associated identifiers on removal might work, though it wouldn't cover removing data at the database level, since signals wouldn't fire. A database trigger implemented in a migration could cover data removal at the ORM or database level, though that would be less obvious to the end user.

In the meantime, this issue can be mitigated by ensuring identifiers are sufficiently unique and carefully deleting data, but I think it would be nice to think about for a future release.

As ever, thanks for your work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions