Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote id usage for resources is problematic #217

Open
abulte opened this issue Jun 13, 2022 · 0 comments
Open

Remote id usage for resources is problematic #217

abulte opened this issue Jun 13, 2022 · 0 comments

Comments

@abulte
Copy link
Contributor

abulte commented Jun 13, 2022

We're currently using the remote resource id for our own resource id

resource = Resource(id=res['id'])

This can be problematic if the id is not unique on the remote portal. It should not happen on the CKAN side, but it will break in at least the following case:

  • a dataset with a given resource id is harvested, then deleted on remote side
  • the dataset is automatically archived (but not deleted) on udata's side
  • a new dataset appears with the same resource id on the remote side
  • a new dataset and resource is created with the same resource id as the previous one which still exists on udata's side
  • ➡️ this leads to an id conflict and for example the stable resource URL will point to the obsolete resource URL

Possible solutions:

  1. stop relying on remote resource id altogether, instead use a new attribute resource.extras.harvest:remote_id to map the the remote resource to the local one
    resource = get_by(dataset.resources, 'id', UUID(res['id']))
    — the local resource will have an auto-generated resource id, which should be unique ➡️ this is nice but we need quite some code changes and a migration
  2. protect the harvesting process against conflictual IDs: raise an error for a given dataset if it contains an existing resource id ➡️ easier to implement but requires a manual action (dataset deletion) to fix the situation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant