Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with constraints on the mapping_justification slot #316

Open
gouttegd opened this issue Sep 11, 2023 · 1 comment
Open

Issues with constraints on the mapping_justification slot #316

gouttegd opened this issue Sep 11, 2023 · 1 comment

Comments

@gouttegd
Copy link
Contributor

gouttegd commented Sep 11, 2023

The mapping_justification slot is defined in the LinkML model as follows:

mapping_justification:
  description: A mapping justification is an action (or the written representation of that action) of showiing a mapping to be right or reasonable.
  range: EntityReference
  pattern: "^semapv:(MappingReview|ManualMappingCuration|LogicalReasoning|LexicalMatching|CompositeMatching|UnspecifiedMatching|SemanticSimilarityThresholdMatching|LexicalSimilarityThresholdMatching|MappingChaining)$"
  required: true
  any_of:
    - equals_string: semapv:LexicalMatching
    - equals_string: semapv:LogicalReasoning
    - equals_string: semapv:CompositeMatching
    - equals_string: semapv:UnspecifiedMatching
    - equals_string: semapv:SemanticSimilarityThresholdMatching
    - equals_string: semapv:LexicalSimilarityThresholdMatching
    - equals_string: semapv:MappingChaining
    - equals_string: semapv:MappingReview
    - equals_string: semapv:ManualMappingCuration

There are several issues with this definition:

  1. Why both a pattern constraint and a any_of constraint? My understanding is that they are redundant. Expressing the same constraint twice in two different forms creates the risk of the two forms becoming out-of-sync, if someone updates, say, the any_of list but forgets to similarly update the pattern expression (a risk made even slightly greater by the fact that the allowed values are not listed in the same order in both forms).

  2. Both lists are already out-of-sync with the Semantic Mapping Vocabulary which, as of today, defines at least three more “matching processes”:

  • https://w3id.org/semapv/vocab/BackgroundKnowledgeBasedMatching
  • https://w3id.org/semapv/vocab/InstanceBasedMatching
  • https://w3id.org/semapv/vocab/MappingInversion

Ultimately, the definition should probably make use of LinkML’s dynamic enums, to avoid having to manually update the constraints in the SSSOM schema every time the semantic mapping vocabulary is enriched.

  1. The equals_string constraints force the slot to have the range string. The LinkML specification is explicit:

the slot must have range string and the value of the slot must equal the specified value

But SSSOM defines mapping_justification as an EntityReference, which is ultimately a uriOrCurie, which in LinkML is a base type unrelated to string.

  1. Independently of the typing issue above, both the pattern and the any_of constraints force the value to be in CURIE form, even though the underlying uriOrCurie type allows for either a CURIE or an URI.
@matentzn
Copy link
Collaborator

Thank you @gouttegd your analysis is spot on. Back when this proposed dynamic enums did not exist yet, and there was no great way to constrain a field like this. So we resorted to regex. I think originally i was recommended any_of but the validation framework back then did not process it, so I added the regex afterwards.

In any case you are 💯 correct that we should switch to dynamic enums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants