Proposal for CurieNamespace with embedded catalog. #244

hsolbrig · 2023-02-02T20:25:37Z

It is my understanding that we are supposed to use the curies package, but it isn't clear how one would add maps incrementally (see: tests/test_utils/test_curienamespace.py#113 as an example). Any solution that passes test_curienamespace.py should do.

kervel · 2023-02-04T09:33:22Z

i guess the test fails because the curienamespace catalog was already populated from test_enum which ran before this test...

kervel · 2023-02-08T17:50:18Z

Wouild it be an option to make the catalog an instance variable instead of a class variable ? if we want to scope namespaces somehow that will be easier ? And it doesn't require passing a curieNamespaces object to each and every function call because we can still have a module-level defined singleton in generated code ?

Other way to fix it is some pre-test code that cleans up the catalog

cmungall · 2023-02-23T01:14:06Z

comments from @kervel on slack:

fails tests because of global prefix catalog in the curienamespaces class. If we would change the order of the tests it would probably pass, but i think moving the catalog out of a class variable so that it becomes scoped sounds like a more robust solution. I cannot estimate the impact this would have on other uses of the prefix catalog

I think I agree but I also think I am lacking a bit of context here. It would be helpful to state here or in the initial comment what the motivation for this PR is, and a high level description of what it seeks to achieve.

kervel · 2023-02-23T09:39:37Z

i'm trying to state the motivation here:

the idea is that the python data classes, both when used to generate json and to parse json, should be able to interpret the type designator URI/CURIES in the most broad/flexible way possible. This is especially true if the range of a type designator field is "uriorcurie".

There are several uri's to designate a type: one has both the native and the assigned (via class_uri uri's for a class), which should yield the same behaviour when used as a type designator. And then both of them can be shortened to curies using any of the prefixes in the prefix map. If a user defines multiple prefixes to the same uri namespaces (for instance because he includes yaml written by someboyd else), he would expect all known prefixes to just work.

so

The idea is for the python data classes to use the metadata already available to do this translation, rather than (like we do in pydantic) just having an exhaustive list of accepted uri's / curies as string values (which makes sense for pydantic because there we don't even depend on linkml-runtime)

so this PR adds functions that translate uries to curies and back using metadata that is in linkml runtime curie namespaces, an object which is already available in the python dataclasses right now.

kervel · 2023-02-27T11:49:39Z

hello, the reason this PR doesn't use the urieconverter from https://github.com/cthoyt/curies is laid out in the first comment here. So i created an issue there biopragmatics/curies#33 to see if that can be fixed (author replied in the mean time)

@hsolbrig biopragmatics/curies#34 . is the following okay ?

factor out a separate class CurieNamespaceCatalog that manages the catalog as an instance variable ?
make this class a thin wrapper around the curies package ? Do we even need a wrapper ?

then we could do

catalog = CurieNamespaceCatalog()
LINKML = CurieNamespace('linkml', 'https://w3id.org/linkml/', catalog=catalog)
SHEX = CurieNamespace('shex', 'http://www.w3.org/ns/shex#', catalog=catalog)
XSD = CurieNamespace('xsd', 'http://www.w3.org/2001/XMLSchema#', catalog=catalog)
DEFAULT_ = CurieNamespace('', 'http://example.org/', catalog=catalog)

or alternatively

catalog = CurieNamespaceCatalog()
LINKML = catalog.namespace('linkml', 'https://w3id.org/linkml/')
SHEX = catalog.namespace('shex', 'http://www.w3.org/ns/shex#')
XSD = catalog.namespace('xsd', 'http://www.w3.org/2001/XMLSchema#')
DEFAULT_ = catalog.namespace('', 'http://example.org/')

cthoyt · 2023-02-27T13:36:03Z

@kervel @hsolbrig the new curies v0.4.3 with incremental construction has been released to PyPI. You can upgrade and use something like the following:

import curies

converter = curies.Converter(records=[])
converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")

kervel · 2023-03-01T15:17:31Z

hello, the test done by @hsolbrig still fails when trying to convert to the new curies v0.4.3:

        # Test incremental add
        CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)
        self.assertEqual('tester:hip_boots', namespaceCatalog.to_curie('http://fester.bester/tester#hip_boots'))

        # Test multiple prefixes for same suffix
        CurieNamespace('ns17', URIRef('http://fester.bester/tester#')).addTo(converter)

here, ns17 is actually a synonym. so curies refuses to take it as a new prefix. off course this could be fixed so that curies automatically converts this to a synonym for an existing Record, but that would complicate things.

this would also fail (a bit further down the same test)

        # Test multiple uris for same prefix
        # The following should be benign
        CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)

        # Issue warnings for now on this
        # TODO: test that we log the following
        #       'Prefix: tester already references http://fester.bester/tester# -
        #       not updated to http://fester.notsogood/tester#'
        CurieNamespace('tester', URIRef('http://fester.notsogood/tester#')).addTo(converter)

cthoyt · 2023-03-01T15:27:48Z

It's possible I could implement something like add_prefix_synonym and add_uri_prefix_synonym to curies.Converter if that would be a good stop-gap before making some more fundamental improvements to the way you're doing bookkeeping

kervel · 2023-03-01T15:31:26Z

@cthoyt i'd rather first get some feedback from the people that know this code base better than i do. the originally proposed api in linkml-runtime doesn't even make a distinction between a synonym and a new prefix (as can be seen from the tests).

kervel · 2023-03-01T16:17:00Z

i created a working proposal here #251 ... i (for now) now bypass the incremental building. code is way too complicated: i need to detect synonyms and i can have transitive equality if you both have prefix synonyms and uri synonyms

not feeling very confident on this

hsolbrig mentioned this pull request Feb 2, 2023

Polymorphism w type designator: make linkml-validate work by adding support in dataclasses + enable include_range_class_descendants for json schema validator linkml/linkml#1257

Closed

kervel mentioned this pull request Feb 27, 2023

Public API to build a converter incrementally biopragmatics/curies#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for CurieNamespace with embedded catalog. #244

Proposal for CurieNamespace with embedded catalog. #244

hsolbrig commented Feb 2, 2023

kervel commented Feb 4, 2023

kervel commented Feb 8, 2023

cmungall commented Feb 23, 2023

kervel commented Feb 23, 2023

kervel commented Feb 27, 2023 •

edited

Loading

cthoyt commented Feb 27, 2023 •

edited

Loading

kervel commented Mar 1, 2023

cthoyt commented Mar 1, 2023

kervel commented Mar 1, 2023

kervel commented Mar 1, 2023

Proposal for CurieNamespace with embedded catalog. #244

Are you sure you want to change the base?

Proposal for CurieNamespace with embedded catalog. #244

Conversation

hsolbrig commented Feb 2, 2023

kervel commented Feb 4, 2023

kervel commented Feb 8, 2023

cmungall commented Feb 23, 2023

kervel commented Feb 23, 2023

kervel commented Feb 27, 2023 • edited Loading

cthoyt commented Feb 27, 2023 • edited Loading

kervel commented Mar 1, 2023

cthoyt commented Mar 1, 2023

kervel commented Mar 1, 2023

kervel commented Mar 1, 2023

kervel commented Feb 27, 2023 •

edited

Loading

cthoyt commented Feb 27, 2023 •

edited

Loading