Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE]fix(mtd): fixing several anomalies for mtd sync with inpn mtd #3230

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

VincentCauchois
Copy link
Member

PR created just to trigger checks, especially pytest check.
One goal is to assert that Python code is compatible with 3.9 version.

@VincentCauchois VincentCauchois changed the base branch from master to develop October 16, 2024 13:16
@VincentCauchois VincentCauchois changed the title fix(mtd): fixing several anomalies for mtd sync with inpn mtd [DO NOT MERGE]fix(mtd): fixing several anomalies for mtd sync with inpn mtd Oct 16, 2024
@VincentCauchois VincentCauchois marked this pull request as ready for review October 16, 2024 13:31
@VincentCauchois VincentCauchois marked this pull request as draft October 16, 2024 13:31
@VincentCauchois VincentCauchois changed the base branch from develop to master October 16, 2024 13:32
@VincentCauchois VincentCauchois changed the base branch from master to develop October 16, 2024 13:32
@VincentCauchois VincentCauchois changed the base branch from develop to master October 16, 2024 13:33
@VincentCauchois VincentCauchois marked this pull request as ready for review October 16, 2024 13:34
@VincentCauchois VincentCauchois marked this pull request as draft October 16, 2024 14:02
@VincentCauchois VincentCauchois force-pushed the fix/sync-mtd-inpn branch 2 times, most recently from a567a40 to 0344f84 Compare October 24, 2024 15:44
@VincentCauchois VincentCauchois force-pushed the fix/sync-mtd-inpn branch 3 times, most recently from 0775ec4 to 5aee135 Compare October 29, 2024 15:19
If sync_af(af) returns None, then it means the AF (Acquisition Framework) could not be retrieved.
Use NumPy/SciPy Dosctrings format rather than Google format.
Add type hint for returns of methods `get_af_list` and `get_ds_list` of class `MTDInstanceApi`.
Retrieve logger instance "MTD_SYNC" rather than root logger.
Improve logging at several stages:
- At the beginning of the processing of the dataset
- When the processing is skipped:
  - unknown nomenclature for data origin
  - parent acquisition framework not in database
Add logging at the beginning of the processing of the acquisition framework.
If an acquisition framework has no UUID provided, retrieval is skipped.
Add type hints for parameters of function `associate_actors`.
@VincentCauchois VincentCauchois force-pushed the fix/sync-mtd-inpn branch 3 times, most recently from 4918cbd to b3daed0 Compare November 4, 2024 10:33
Embed the final `insert` statement in a try-except to handle possible database integrity errors.

If such an integrity error happens, the current database session is rolled back ; important to allow further statements in the mtd synchronisation to succeed ; and a nicely formatted error message is logged.

Two utils functions for logging are added:
- `format_sqlalchemy_error_for_logging`: output a formatted error message with SQL query and params, from a SQLAlchemyError
- `format_str_dict_actor_for_logging`: output a formatted string with information from the dict `actor`
If no organism name is retrieved ; i.e. `organisn_name` is None ; the value for the field `bib_organismes.nom_organisme` must be set to None accordingly and not to the empty string "".
@VincentCauchois VincentCauchois force-pushed the fix/sync-mtd-inpn branch 6 times, most recently from 2c8cee3 to c1b5846 Compare November 4, 2024 15:03
In the case where the actor to be associated is tried to be retrieved as a user corresponding to the actor email, one must ensure that the role retrieved from `utilisateurs.t_roles` is an actual user and not a group.

There are database constraints on `gn_meta.cor_*_actor` tables that forbid the field `id_role` to be set with a value corresponding to a group.
Handle case when the insert in `gn_meta.cor_*_actor` table is intended with a role rather than an organism.
In this case, the `.on_conflict_do_nothing` method, used for the SQL instert statement, must be called with "id_role" rather than "id_organism" in the index_elements.
If there is no organism UUID for the actor of a metadata but there still is an organism name, an organism ID is retrieved:
- either as the ID of an eventually existing organism corresponding to this name
- or as the ID of a newly created organism with this name, and with a newly generated UUID // no email is retrieved in this case
An association between the metadata and this existing-or-new organism will then be retrieved.
In the function `associate_actors` make the association to be with an organism if one is retrieved, and if that is impossible with a user. Therefore, if both an organism and a user can be retrieved for the actor of the metadata, the association is made with the organism rather than with the user.
If not enough information could be retrieved to associate a metadata to an actor that is a "Contact Principal", then a specific user is used, and possibly created, to be associated as the "Contact principal" with the metadata.

This specific user has the following information:
- id_role, ID : 0 - particular positive integer value chosen to avoid conflict with other IDs, in particular those retrieved from the INPN
- desc_role, description : "Contact principal for 'orphan' metadata - i.e. with no 'Contact Principal' that could be retrieved during INPN MTD synchronisation"

In the case where not enough information could be retrieved to associate a metadata to an actor but the actor is not a "Contact principal", then the association is simply skipped and a warning message with details is logged.
Commit the retrieval of an acquisition framework to the database before processing the retrieval and association of its actors, i.e. the execution of `associate_actors`.
This prevents the retrieval of the AF not be to committed in the case there happen to be issues in the execution of `associate_actors`
Add an util function `format_acquisition_framework_id_from_xml` to handle, in particular, a specific case for the provided acquisition framework UUID for a dataset:
- Remove the possible prefix "http://oafs.fr/meta/ca/"

Note: such "http://oafs.fr/meta/ca/" prefix was noticed to be present for some datasets retrieved from INPN Métadonnées PREPROD for the instance 'Thématique' (ID : 2).
Two modifications:
- Clarification: add "single_" to clearly indicates that this function is suited for a XML ; from INPN MTD ; with only one acquisition framework
- Typo: correct "framwork" into "framework"
Add a function `parse_acquisition_frameworks_xml` used for parsing of a XML files with several acquisition frameworks retrieved from INPN MTD.
Use the function in the methods `MTDInstanceApi.get_af_list` and `MTDInstanceApi.get_list_af_for_user`, thus making them homogeneous with the methods `MTDInstanceApi.get_ds_list` and `MTDInstanceApi.get_ds_user_list`.
- fix user synchro mtd triggered by user actions
- feat improve debug logging for mtd sync process
- fix creation of user "Contact Principal for orphan metadata"
- fix dataset_name truncating and limiting to 255 char
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant