Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New data quality check: deleted entities #122

Open
Ben-Hodgkiss opened this issue Oct 22, 2024 · 1 comment
Open

New data quality check: deleted entities #122

Ben-Hodgkiss opened this issue Oct 22, 2024 · 1 comment

Comments

@Ben-Hodgkiss
Copy link
Contributor

Overview

Data management need a new data quality assessment implemented to detect when data providers have removed records from an endpoint, so that we can reliably identify when entities need to be end-dated.

This is sort of the inverse of the exising unknown entity issue:

  • unknown entity, when there are records in a new resource which can’t be mapped to existing entities
  • deleted entity, when there are existing lookups which can’t be mapped to records on a new resource.

Demo code for one possible approach in the notebook here: https://github.com/digital-land/jupyter-analysis/blob/gs/deleted-entities/analysis/2024-10_deleted_entities/deleted_entities.ipynb

For each provision (e.g. Buckinghamshire article-4-direction-area) this code compares the reference values for all existing entities with the reference values for all currently active resources. Where there are entities with reference values that don’t existing in active resources they are flagged as entities which should probably be end-dated.

Dependencies

Identifying deleted entities is one thing, actually end-dating / retiring the entities identified as deleted requires agreeing with Data Design team how to represent this in the data model, i.e. if we just give entities an end-date, and if so what date we use as the end date.

Tech Approach
To be completed by dev.

Acceptance Criteria/Tests

  • issues to be recorded in issues or expectations issues tables.

Resourcing
Are there any tickets that need to be completed before this one can be? Are there any limitations as to who in the team can complete this ticket?

@Ben-Hodgkiss Ben-Hodgkiss converted this from a draft issue Oct 22, 2024
@Ben-Hodgkiss
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant