Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add identifierNAME to NXobject #1486

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

lukaspie
Copy link
Contributor

@lukaspie lukaspie commented Sep 30, 2024

Implements vote in #1451 (comment).
Depends on #1485

Any implementation should consider comments in #1416

@@ -30,5 +30,25 @@
This is the base object of NeXus
</doc>
<!--attribute name="name"><doc>name of instance</doc></attribute-->
<field name="identifierNAME" type="NX_CHAR" nameType="partial">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it fits in the list of reserved prefixes. Add it to that list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other prefixes are only defined in the documentation. Should we make them available to XML by adding them here, similar to "identifierNAME"? Or do this in a separate issue/branch/PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added identifier to the reserved prefixes.

Copy link
Contributor Author

@lukaspie lukaspie Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other prefixes are only defined in the documentation. Should we make them available to XML by adding them here, similar to "identifierNAME"? Or do this in a separate issue/branch/PR?

I think it is good to add them to NXobject directly. That still doesn't automatically make them reserved (it is still technically possible to overwrite them somewhere else), but at least the docs can be clearly defined and then inherited.

I suggest to do it in a different PR since for the identifier change we already have a NIAC vote (not sure if the other changes would need a separate one).

@rayosborn
Copy link
Contributor

rayosborn commented Oct 4, 2024

I thought that, in the end, we decided against using "identiferNAME", instead of an identifier attribute, just called "identifier". Obviously, @sanbrock has the official record.

@prjemian
Copy link
Contributor

prjemian commented Oct 4, 2024

If it is an attribute, then it would be added to the nxdl.xsd schema.

@rayosborn
Copy link
Contributor

Isn't it possible also to add attributes to the NXobject class at least for groups?

@prjemian
Copy link
Contributor

prjemian commented Oct 4, 2024 via email

@lukaspie
Copy link
Contributor Author

lukaspie commented Oct 7, 2024

I added some of the feedback provided by @paulmillar in #1416 with respect to the type of identifier. He made a strong point that service is not the right word, but rather something like type.

Note that here I am still adding identifierNAME as a field rather than an attribute. My argument for this is that if we were to use an attribute and still keep the service and is_persistent (as was agreed if I recall correctly), we would need to add three attributes like this: identifierNAME, identifier_serviceNAME, identifier_is_persistentNAME, which would be rather messy. Since identifier will be added to the reserved prefixes anyway, why not make it a field with the other two concepts as attributes? Maybe @sanbrock can also chime in with his opinion.

I thought that, in the end, we decided against using "identiferNAME", instead of an identifier attribute, just called "identifier". Obviously, @sanbrock has the official record.

My understanding was that it shall be possible to have more than one identifier (e.g., from different services) for one object. I think this is also what @sanbrock noted. Even if identifier will become an attribute, I would suggest to use <attribute name="identifierNAME" type="NX_CHAR" nameType="partial">.

@prjemian
Copy link
Contributor

prjemian commented Oct 7, 2024

We have been reducing use of type to a few limited situations because its meaning varies with context. XML Schema, XML, NXDL, numeric, data, units, ... (I'm certain I've missed a few.) All of these want to use this noun. Quickly, it became bewildering. At the end of the day, the XML Schema and the NXDL data type win. Here's an example:

<xs:element name="definition" type="nx:definitionType">

Please pick a noun which is not type.

@rayosborn
Copy link
Contributor

At the NIAC, I recall that people (including, I believe, @phyy-nx) thought the "is_persistent" tag unnecessary. We need @sanbrock to confirm, but I also believe that the attributes would just be "identifier" and "service." There is no need to add partial names here, which I agree become very messy. I have no idea whether we specifically voted on any of these issues.

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

We have not voted on it but agreed that a workgrouo shall conclud with a reasonable solution in a form of a PR which can then be voted on.

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

Indeed, we qgreef to use type instead of service.

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

There was no real conclusion on is_persistant. Indeed, its necessity was debated, although I argued that it is acually good to know if an Idebtifier is a PID or not.
Ww may state that the type holds this information. E.g. doi or orcid are good examples of PIDs.

@rayosborn
Copy link
Contributor

@sanbrock, do you believe we came to any conclusion about the need for partial names? Attribute names such as "identifierDOI" are pretty ugly IMHO. If there is going to be a working group, perhaps this PR should be, at least temporarily, withdrawn.

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

Note that we did agree that our solution shall allow attaching multiple identifiers (e.g. orcid, linkdin to a USER).

  • identifierNAME with nameType=partial would do that job.
  • identifier as a reserved prefix would also do that (but this solution is rather a convention /only appears in the documentation and/or in the xsd descriptuon of the NeXus Definiion Language/ rather than an asserted statement written in NXDL in a definition. While the former requires hard coded implementation in all NEXUS tools in any programming languages used, the later would be inferred automatically by any NEXUS tools which has implemented the interpretation of NeXus definitions written in NXDL. Because of this, I would prefer @identifierNAME with @identifier_typeNAME declared in NXobject over the actual simpler solution of listing identifier among the reserved prefixes.
  • identifierNAME as a deckared Field in NXobject would also do the same, but it would allow adding @type to it. This seems to be the most elegant solution. Knowing that only a few udentifiers will be provided in practice for a given group, I do not think it has disadvantages compared to using attributes.

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

@sanbrock, do you believe we came to any conclusion about the need for partial names? Attribute names such as "identifierDOI" are pretty ugly IMHO. If there is going to be a working group, perhaps this PR should be, at least temporarily, withdrawn.

This is actually a draft PR. And we are the workgroup. Note that we wanted to invite @paulmillar too to this duscussion.

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

Note that another alternative is
@identifierTYPE - this allows the use of identifier_orcid, identifier_url, identifier_iri, etc. in the data file, and would make the @identifier_typeNAME (or identifier/@type) as an extra attribute unnecessary.

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

NIAC suggestions on the revwrite of NXidentifirler were recorded in #1451

@sanbrock
Copy link
Contributor

sanbrock commented Oct 7, 2024

These have been also voted for in Session J. See https://www.nexusformat.org/content/NIAC2024_minutes/


This refers specifically to an ID in the Handle system operated by the Corporation for National Research Initiatives (CNRI).

Syntax: hdl:prefix/identifier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the hdl: prefix in the syntax? I don't think this adds anything useful because the type attribute already indicates the identifier is a Handle.

I suggest the syntax is updated to:

Syntax: prefix/identifier
Example: 123456789/abc123

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, changed


The IGSN is a unique identifier assigned to a specific sample or specimen in the context of scientific research.

Syntax: https://igsn.org/{IGSN-ID}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this syntax is correct.

Per the Wikipedia article, IGSNs are now issued by DataCite.

Also, from this wikipedia article, an example of a DateCite issued IGSN is 10.58052/SSH000SUA. Note that this is also a DOI.

Per the partnership agreement:

Existing IGSN ID handles will now be registered IGSN ID
DOIs and the handles aliased to the DOIs to ensure that
these continue to resolve. 

I take this to mean that there are now DataCite-issued DOIs for all IGSNs, including those historical IGSNs issued before the partnership.

Therefore, I believe NeXus can use the same syntax for DOIs and IGSNs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this was the older syntax. I added some text describing this change:

Since 2021, IGSNs are issued by DataCite, meaning that hat there are now DataCite-issued DOIs for all IGSNs, including those historical IGSNs issued beforehand. Therefore, the syntax is the same as for DOIs.

Syntax: 10.XXXX/XXXXXX
Example: 10.1107/S1600576714027575

An ISNI is made up of 16 digits, the last character being a check character. The check character may be either a decimal digit
or the character “X”.

Syntax: https://isni.org/isni/{ISNI-ID}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the https://isni.org/isni/ prefix?

Couldn't we store the ISNI-ID number and document that a URL may be derived from the ISNI by adding the prefix?

i.e.,
Syntax: 16 base-10 digits stored without any spaces.

Apparently, specifying the lack of spaces is important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I used

A URL can be generated from the ISNI ID by combining it with the prefix https://isni.org/isni/, resulting in
https://isni.org/isni/{ISNI-ID}.
						
Syntax: 16 base-10 digits stored without any spaces.
Example: 0000000121032683


An ISSN is an 8-digit unique identifier used to distinguish a serial publication, whether in print or electronic form.

Syntax: ISSN XXXX-XXXX

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I don't think the prefix ISSN in the syntax adds anything useful as the type attribute already identifies the value as an ISSN.

Also (borrowing from Wikipedia) the syntax might be better expressed as NNNN-NNNC

where N is in the set {0,1,2,...,9}, a decimal digit character, and C is in {0,1,2,...,9,X};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I aded text describing that the last digit is a check character and changed the syntax as you suggested.

Persistent identifiers are also known as PIDs.
</doc>
<attribute name="type" type="NX_CHAR">
<doc>The type of identifier used.</doc>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to include a comment about using the most specific type when describing the identifier.

For example, all IGSNs are DOIs and all DOIs are Handles; however, an IGSN should have type IGSN (and not DOI or Hdl).

Similarly, an ARK, Purl, ORCID and ROR identifiers should have their corresponding types and should not use the more generic URL identifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully agreed, I added a recommendation along the lines you describe here

</item>
<item value="URN">
<doc>
Uniform Resource Name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should mention that identifiers with more specific type attribute values should not be stored as a URN, even when this is valid.

The two example that springs to mind are DOI and ISSN, but there may be others with valid URN namespaces.

The doi URN namespace has been registered, so the URN doi:10.1107/S1600576714027575 is a valid URN-based representation for the DOI 10.1107/S1600576714027575.

Similarly, the ISSN URN namespace has been registered, so the URN URN:ISSN:1234-1231 is a valid URN that refers to the ISSN 1234-1231.

However, I believe we shouldn't allow DOIs (and IGSNs) and ISSNs to be written using the URN type, but rather with their more specific types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

It is recommended that identifiers with more specific type attribute (such as DOI or ISSN) values should not be stored as a URN, even when this is valid. As an example, the URN doi:10.1107/S1600576714027575 is a valid URN-based representation for the DOI 10.1107/S1600576714027575, but it is strongly recommended to use type="DOI" in this case.

``SAS_`` attributes reserved for use by canSAS https://www.cansas.org
``SILX_`` attributes reserved for use by silx https://www.silx.org
============ ================== ============================================ =============================================================
reserved prefixes; identifier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are some formatting changes.

IMHO, it would be better to separate out layout/formatting changes from content-modifying changes. This patch should add the identifier row, but not modify the existing rows (if possible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically just changed the length of the ============ part to accomodate identifier. Unfortunately, the doc build currently does not work (because of the new nameType="partial"), but we can certainly check afterwards if that formatting change is absolutely neccessary.

@lukaspie
Copy link
Contributor Author

CI/CD failing because multi-line doc handling is not implemented (see #1491)

@lukaspie lukaspie marked this pull request as ready for review October 17, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants