Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/use dynamic harvest field #2762

Merged
merged 38 commits into from
Nov 10, 2022

Conversation

maudetes
Copy link
Contributor

@maudetes maudetes commented Aug 22, 2022

Fix datagouv/data.gouv.fr#818, alternative to #2750

Uses a separate harvest dynamic document to store harvest information.
The core fields are defined in /dataset/models.py.
Any entry can be added freely, without validation however.

Add a migration (around 15min locally) to move harvest metadata.

Explicit api field definition is needed for core or any additional fields to expose it by api. See https://github.com/maudetes/udata/blob/67dfdda59750eb4224c07b1da4c792f41147a26f/udata/core/dataset/api_fields.py#L22 for fields defined in udata core.
Other entries would be added by modifying this field definition, ex in udata-ods:

from udata.api import fields
from udata.core.dataset.api_fields import dataset_harvest_fields

dataset_harvest_fields['ods_url'] = fields.String(description='The ods url for ods harvested dataset', allow_null=True)

Harvest dates are now stored in the harvest metadata and don't override the object dates.
Thus, we should iterate to return the correct dates on the frontend (ex: max between mongo object & harvest metadata?).

The exhausting list of dataset extras that have been migrated to harvest metadata is:

  • uri (also for resource)
  • harvest:source_id
  • harvest:remote_id
  • harvest:last_update
  • harvest:domain
  • harvest:archived_at
  • harvest:archived
  • remote_url
  • More in harvest plugins (ods and ckan-prefixed extras)

TODO

  • migration -> only identifying fields. Takes about 10min locally
  • Some defined fields could be replaced or merged? Ex: dct_identifier is the same as remote_id for dcat harvested datasets. Are these values needed? Made a first attempt at removing those: maudetes@a5a6d97. -> We keep these for now, see Feat/use dynamic harvest field #2762 (comment)

@maudetes maudetes requested a review from quaxsze October 10, 2022 17:05
@maudetes maudetes requested a review from abulte November 2, 2022 13:28
Copy link
Contributor

@abulte abulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, big work 👏 Code is much easier to read.

Really not sure about that dct_identifier thing...

NB: empty file here https://github.com/opendatateam/udata/pull/2762/files#diff-1a11729151284ff74f2c7d1cbec38f0e8960a87758b4c755bf3f9d612722cfce, maybe remove?

udata/core/dataset/api_fields.py Show resolved Hide resolved
udata/core/dataset/api_fields.py Outdated Show resolved Hide resolved
udata/core/dataset/rdf.py Show resolved Hide resolved
udata/core/dataset/rdf.py Show resolved Hide resolved
udata/core/dataset/rdf.py Outdated Show resolved Hide resolved
udata/core/dataset/rdf.py Outdated Show resolved Hide resolved
udata/harvest/backends/base.py Outdated Show resolved Hide resolved
udata/harvest/backends/base.py Outdated Show resolved Hide resolved
udata/migrations/2022-10-10-migrate-harvest-extras.py Outdated Show resolved Hide resolved
udata/core/dataset/api_fields.py Outdated Show resolved Hide resolved
udata/core/dataset/api_fields.py Outdated Show resolved Hide resolved
@maudetes maudetes marked this pull request as ready for review November 8, 2022 13:30
@maudetes maudetes requested a review from abulte November 8, 2022 13:33
Copy link
Contributor

@abulte abulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration question

CHANGELOG.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distinguer les dates moissonnée et internes dans udata
3 participants