Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split centres.xml across the individual recommendations? #247

Closed
bansp opened this issue Oct 26, 2023 · 9 comments
Closed

split centres.xml across the individual recommendations? #247

bansp opened this issue Oct 26, 2023 · 9 comments
Assignees
Labels
priority RI-interoperability matters regarding broadly understood interoperability with other Research Infrastructures SIS:centres centre-oriented part of the SIS SIS:formats format description and placement in the SIS SIS:schemas document grammars for parts of the SIS DB
Milestone

Comments

@bansp
Copy link
Member

bansp commented Oct 26, 2023

I am not sure if we can find serious arguments against keeping the centre info separate from the recommendations.

Currently, we list 51 centres and have 52 recommendation documents, and that's already a yellow blinking light.

Each recommendation tells us something about the single centre, they are connected implicitly, via the file name. So, the info on the IDS in centres.xml is:

  <centre id="IDS" deposition="1">
    <name>Leibniz-Institut für Deutsche Sprache</name>
    <a href="https://centres.clarin.eu/centre/11"/>
    <nodeInfo>
      <ri status="B-centre">CLARIN</ri>
      <ri status="Collections, Lexical Resources, Operations">Text+</ri>
    </nodeInfo>
  </centre>

... and the filename of the recommendations is IDS-recommendation.xml. The recommendation file has a header, currently set to the following (and note the redundance between the file name and the filter element):

  <header>
      <lastUpdateCommitID>76e7d21871f644d0ea8b5b62811a493304d9cd18</lastUpdateCommitID>
      <filter>
         <centre>IDS</centre>
      </filter>
      
      <respStmt>
         <curator>Piotr Banski</curator>
         <github>https://github.com/bansp</github>
         <reviewDate>2023-07-24</reviewDate>
      </respStmt>
      
   </header>

(where the maintainer info was added for the sake of a demo at CAC-2023 and I don't consider it binding...)

Arguments for merging:

  • data modelling: these pieces belong together, the relationship between the centre element and the relevant recommendation file should always be 1:1
  • ease of maintenance: we'd only expect PRs of the recommendation files
  • logic / data modelling: centres info is secondary to recommendations; it is not independent; we are not a centre-information facility -- that is the role of the CLARIN DB and the databases of other research infrastructures; we should only query those DBs in a sanity-checking operation, once in a while
  • again a bit of a data-modelling issue: IF, say, centre X recommends one set of formats as a CLARIN/Text+ centre, and another as a DARIAH centre (and it's really a single institution, and that's how we want to represent it) then we can allow RI-specific attribute on <formats>: <formats ri="dariah"> would be DARIAH-specific while <formats> would go for all the RIs involved. (this is not yet technically possible and won't be until we see the need for such an arrangement; but it would be good to have the base for that in place)
  • centres list is a projection of recommendations; we don't use it separately for the (currently somewhat dormant) standards information part; it doesn't make sense to say that one centre does or does not use the given standard; the standards information should exist separately, and be information on standards, with maaaaybe a mention of some centre in the overall description. We don't need a skeletal list of centres, used by both format recommendations and standards info.

Argument against: technical complexity and bugs cropping up after the rearrangement, but that's a non-argument, really. It might be an argument for releasing 3.0.

@bansp bansp added SIS:formats format description and placement in the SIS SIS:schemas document grammars for parts of the SIS DB SIS:centres centre-oriented part of the SIS labels Oct 26, 2023
@bansp bansp added this to the SIS v. 2.8.0 milestone Oct 26, 2023
@bansp
Copy link
Member Author

bansp commented Oct 26, 2023

To be sure: I suggest that each <centre> element goes into the header of a recommendations file, and that we get rid of centres.xml.

Being able to use the attribute @ri on <formats> is one thing that would follow. Another would be the use of that attribute on <name>, to differentiate between the centre name used for one RI vs. another (this is with reference to issue #245 ).

@bansp bansp added the RI-interoperability matters regarding broadly understood interoperability with other Research Infrastructures label Oct 26, 2023
@bansp
Copy link
Member Author

bansp commented Oct 26, 2023

Ah, incidentally, the <a> element is extremely anti-semantic, we should get it replaced, probably with added flexibility for RIs.

@margaretha
Copy link
Collaborator

Eliza will move centre elements to the recommendation files.

@bansp
Copy link
Member Author

bansp commented Dec 20, 2023

I will add the element as optional, for the time being, to the schema for recommendations. After the move, I'll make it obligatory.

@margaretha
Copy link
Collaborator

I mean in in export files when the format recommendations are not filtered by centreID, e.g.

<format id="fComa">
   <centre>HZSK</centre>
   <domain>Contextual Information</domain>
   <level>recommended</level>
</format>

@margaretha
Copy link
Collaborator

I have added centre elements in the recommendation files and updated the code to use them instead of the centres.xml. Please check the dev branch.

mmatthiesencsc added a commit to CSCfi/Kielipankki-standards that referenced this issue Feb 15, 2024
* Added domains of recommendations in centre pages (close clarin-eric#240)

* Implemented multiple curators (clarin-eric#238)

* Updated the style of multiple curators and added an example (clarin-eric#238)

* Make base URL semi dynamic (close clarin-eric#248)

* Added domains of recommendations in format pages (clarin-eric#240)

* addresses clarin-eric#238

* KP-7936 Update FIN-CLARIN-recommendation.xml

We went through all functional domains and added formats as we saw relevant. We skipped domains we deemed not relevant to Kielipankki.

* add jussi + stub for <info>

* make "centre" optional (for now) in the header - addresses clarin-eric#247 , change the former "centre" to "centreID" - references clarin-eric#249

* change "centre" to "centreID" in the filter field - addresses clarin-eric#249

* Fixed centreID.

* Added centre elements in recommendation files (clarin-eric#247)

* Updated references from filter/centre to filter/centreID (clarin-eric#249)

* Updated centre model to use recommendation files instead of centres.xml
(clarin-eric#247)

* Added SAW recommendation (clarin-eric#247)

* add "centreID" as an optional child of "format", with an annotation stating its purpose; see clarin-eric#249 (comment)

* KP-7936 add PDF for documentation, add review date

* KP-7936 remove PDF* for textual src lang data

* KP-7936 add info text based on Språkbanken

* KP-7936 Update FIN-CLARIN-recommendation.xml

We went through all functional domains and added formats as we saw relevant. We skipped domains we deemed not relevant to Kielipankki.

* add jussi + stub for <info>

* KP-7936 add PDF for documentation, add review date

* KP-7936 remove PDF* for textual src lang data

* KP-7936 add info text based on Språkbanken

* KP-7936 fix indent

---------

Co-authored-by: margaretha <[email protected]>
Co-authored-by: piotr <piotr@bodysek>
@bansp
Copy link
Member Author

bansp commented Mar 26, 2024

Ah, I was wondering when I saw the duplicated info, both in centres.xml and in the individual recommendations. And it's been waiting for me! :-)
Well, let it wait a while longer, since we have the issue open still so it won't be forgotten.
To be sure: the move looks fine to me, but I've only had a look rather than giving a try to a version with centres.xml either thinned down or altogether removed.
I'll get to that in mid-May, I expect.

@bansp
Copy link
Member Author

bansp commented May 17, 2024

It's mid-May so I am going to have a look now and most probably close it.

@bansp
Copy link
Member Author

bansp commented May 23, 2024

Don't want to close the ticket yet, because the info is still duplicated. I suggest that we remove centres.xml after the upcoming release and then deal with potential consequences. I'm moving this ticket to 2.8.0; it already has the priority label.

Edit: oh let me wait with the milestone switch until tomorrow (May 24th). Maybe we can remove centres.xml right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority RI-interoperability matters regarding broadly understood interoperability with other Research Infrastructures SIS:centres centre-oriented part of the SIS SIS:formats format description and placement in the SIS SIS:schemas document grammars for parts of the SIS DB
Projects
None yet
Development

No branches or pull requests

2 participants