add identifierNAME to NXobject #1486

lukaspie · 2024-09-30T08:19:53Z

Implements vote in #1451 (comment).
Depends on #1485

Any implementation should consider comments in #1416

prjemian · 2024-10-04T13:52:04Z

base_classes/NXobject.nxdl.xml

@@ -30,5 +30,25 @@
 		This is the base object of NeXus
 	</doc>
        <!--attribute name="name"><doc>name of instance</doc></attribute-->
+	<field name="identifierNAME" type="NX_CHAR" nameType="partial">


This looks like it fits in the list of reserved prefixes. Add it to that list.

The other prefixes are only defined in the documentation. Should we make them available to XML by adding them here, similar to "identifierNAME"? Or do this in a separate issue/branch/PR?

I added identifier to the reserved prefixes.

The other prefixes are only defined in the documentation. Should we make them available to XML by adding them here, similar to "identifierNAME"? Or do this in a separate issue/branch/PR?

I think it is good to add them to NXobject directly. That still doesn't automatically make them reserved (it is still technically possible to overwrite them somewhere else), but at least the docs can be clearly defined and then inherited.

I suggest to do it in a different PR since for the identifier change we already have a NIAC vote (not sure if the other changes would need a separate one).

rayosborn · 2024-10-04T14:15:11Z

I thought that, in the end, we decided against using "identiferNAME", instead of an identifier attribute, just called "identifier". Obviously, @sanbrock has the official record.

prjemian · 2024-10-04T14:22:09Z

If it is an attribute, then it would be added to the nxdl.xsd schema.

rayosborn · 2024-10-04T14:49:11Z

Isn't it possible also to add attributes to the NXobject class at least for groups?

prjemian · 2024-10-04T14:51:09Z

It gets messy there. Can be very precise in the schema.

…

On Fri, Oct 4, 2024, 9:49 AM Ray Osborn ***@***.***> wrote: Isn't it possible also to add attributes to the NXobject class at least for groups? — Reply to this email directly, view it on GitHub <#1486 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARMUMETIDXGDZ57Y7ZJV3TZZ2TH5AVCNFSM6AAAAABPCXF2O2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJTHA4DQMBRHE> . You are receiving this because you commented.Message ID: ***@***.***>

lukaspie · 2024-10-07T14:00:51Z

I added some of the feedback provided by @paulmillar in #1416 with respect to the type of identifier. He made a strong point that service is not the right word, but rather something like type.

Note that here I am still adding identifierNAME as a field rather than an attribute. My argument for this is that if we were to use an attribute and still keep the service and is_persistent (as was agreed if I recall correctly), we would need to add three attributes like this: identifierNAME, identifier_serviceNAME, identifier_is_persistentNAME, which would be rather messy. Since identifier will be added to the reserved prefixes anyway, why not make it a field with the other two concepts as attributes? Maybe @sanbrock can also chime in with his opinion.

I thought that, in the end, we decided against using "identiferNAME", instead of an identifier attribute, just called "identifier". Obviously, @sanbrock has the official record.

My understanding was that it shall be possible to have more than one identifier (e.g., from different services) for one object. I think this is also what @sanbrock noted. Even if identifier will become an attribute, I would suggest to use <attribute name="identifierNAME" type="NX_CHAR" nameType="partial">.

prjemian · 2024-10-07T14:13:03Z

We have been reducing use of type to a few limited situations because its meaning varies with context. XML Schema, XML, NXDL, numeric, data, units, ... (I'm certain I've missed a few.) All of these want to use this noun. Quickly, it became bewildering. At the end of the day, the XML Schema and the NXDL data type win. Here's an example:

definitions/nxdl.xsd

Line 47 in 67d8519

<xs:element name="definition" type="nx:definitionType">

Please pick a noun which is not type.

rayosborn · 2024-10-07T14:26:35Z

At the NIAC, I recall that people (including, I believe, @phyy-nx) thought the "is_persistent" tag unnecessary. We need @sanbrock to confirm, but I also believe that the attributes would just be "identifier" and "service." There is no need to add partial names here, which I agree become very messy. I have no idea whether we specifically voted on any of these issues.

sanbrock · 2024-10-07T14:42:41Z

We have not voted on it but agreed that a workgrouo shall conclud with a reasonable solution in a form of a PR which can then be voted on.

sanbrock · 2024-10-07T14:43:39Z

Indeed, we qgreef to use type instead of service.

sanbrock · 2024-10-07T14:47:43Z

There was no real conclusion on is_persistant. Indeed, its necessity was debated, although I argued that it is acually good to know if an Idebtifier is a PID or not.
Ww may state that the type holds this information. E.g. doi or orcid are good examples of PIDs.

rayosborn · 2024-10-07T14:52:21Z

@sanbrock, do you believe we came to any conclusion about the need for partial names? Attribute names such as "identifierDOI" are pretty ugly IMHO. If there is going to be a working group, perhaps this PR should be, at least temporarily, withdrawn.

sanbrock · 2024-10-07T15:08:25Z

Note that we did agree that our solution shall allow attaching multiple identifiers (e.g. orcid, linkdin to a USER).

identifierNAME with nameType=partial would do that job.
identifier as a reserved prefix would also do that (but this solution is rather a convention /only appears in the documentation and/or in the xsd descriptuon of the NeXus Definiion Language/ rather than an asserted statement written in NXDL in a definition. While the former requires hard coded implementation in all NEXUS tools in any programming languages used, the later would be inferred automatically by any NEXUS tools which has implemented the interpretation of NeXus definitions written in NXDL. Because of this, I would prefer @identifierNAME with @identifier_typeNAME declared in NXobject over the actual simpler solution of listing identifier among the reserved prefixes.
identifierNAME as a deckared Field in NXobject would also do the same, but it would allow adding @type to it. This seems to be the most elegant solution. Knowing that only a few udentifiers will be provided in practice for a given group, I do not think it has disadvantages compared to using attributes.

sanbrock · 2024-10-07T15:11:36Z

@sanbrock, do you believe we came to any conclusion about the need for partial names? Attribute names such as "identifierDOI" are pretty ugly IMHO. If there is going to be a working group, perhaps this PR should be, at least temporarily, withdrawn.

This is actually a draft PR. And we are the workgroup. Note that we wanted to invite @paulmillar too to this duscussion.

sanbrock · 2024-10-07T15:18:54Z

Note that another alternative is
@identifierTYPE - this allows the use of identifier_orcid, identifier_url, identifier_iri, etc. in the data file, and would make the @identifier_typeNAME (or identifier/@type) as an extra attribute unnecessary.

sanbrock · 2024-10-07T15:28:32Z

NIAC suggestions on the revwrite of NXidentifirler were recorded in #1451

sanbrock · 2024-10-07T15:35:37Z

These have been also voted for in Session J. See https://www.nexusformat.org/content/NIAC2024_minutes/

paulmillar · 2024-10-07T19:51:11Z

base_classes/NXobject.nxdl.xml

+
+						This refers specifically to an ID in the Handle system operated by the Corporation for National Research Initiatives (CNRI).
+
+						Syntax: hdl:prefix/identifier


Do we need the hdl: prefix in the syntax? I don't think this adds anything useful because the type attribute already indicates the identifier is a Handle.

I suggest the syntax is updated to:

Syntax: prefix/identifier Example: 123456789/abc123

agreed, changed

paulmillar · 2024-10-07T20:02:38Z

base_classes/NXobject.nxdl.xml

+
+						The IGSN is a unique identifier assigned to a specific sample or specimen in the context of scientific research.
+
+						Syntax: https://igsn.org/{IGSN-ID}


I'm not convinced this syntax is correct.

Per the Wikipedia article, IGSNs are now issued by DataCite.

Also, from this wikipedia article, an example of a DateCite issued IGSN is 10.58052/SSH000SUA. Note that this is also a DOI.

Per the partnership agreement:

Existing IGSN ID handles will now be registered IGSN ID DOIs and the handles aliased to the DOIs to ensure that these continue to resolve.

I take this to mean that there are now DataCite-issued DOIs for all IGSNs, including those historical IGSNs issued before the partnership.

Therefore, I believe NeXus can use the same syntax for DOIs and IGSNs.

I agree, this was the older syntax. I added some text describing this change:

Since 2021, IGSNs are issued by DataCite, meaning that hat there are now DataCite-issued DOIs for all IGSNs, including those historical IGSNs issued beforehand. Therefore, the syntax is the same as for DOIs. Syntax: 10.XXXX/XXXXXX Example: 10.1107/S1600576714027575

paulmillar · 2024-10-07T20:07:35Z

base_classes/NXobject.nxdl.xml

+						An ISNI is made up of 16 digits, the last character being a check character. The check character may be either a decimal digit
+						or the character “X”. 
+
+						Syntax: https://isni.org/isni/{ISNI-ID}


Do we really need the https://isni.org/isni/ prefix?

Couldn't we store the ISNI-ID number and document that a URL may be derived from the ISNI by adding the prefix?

i.e.,
Syntax: 16 base-10 digits stored without any spaces.

Apparently, specifying the lack of spaces is important.

Sounds good. I used

A URL can be generated from the ISNI ID by combining it with the prefix https://isni.org/isni/, resulting in https://isni.org/isni/{ISNI-ID}. Syntax: 16 base-10 digits stored without any spaces. Example: 0000000121032683

paulmillar · 2024-10-07T20:10:19Z

base_classes/NXobject.nxdl.xml

+
+						An ISSN is an 8-digit unique identifier used to distinguish a serial publication, whether in print or electronic form.
+
+						Syntax: ISSN XXXX-XXXX


As above, I don't think the prefix ISSN in the syntax adds anything useful as the type attribute already identifies the value as an ISSN.

Also (borrowing from Wikipedia) the syntax might be better expressed as NNNN-NNNC

where N is in the set {0,1,2,...,9}, a decimal digit character, and C is in {0,1,2,...,9,X};

Sounds good, I aded text describing that the last digit is a check character and changed the syntax as you suggested.

paulmillar · 2024-10-07T20:18:23Z

base_classes/NXobject.nxdl.xml

+			Persistent identifiers are also known as PIDs.
+		</doc>
+		<attribute name="type" type="NX_CHAR">
+			<doc>The type of identifier used.</doc>


I think it would make sense to include a comment about using the most specific type when describing the identifier.

For example, all IGSNs are DOIs and all DOIs are Handles; however, an IGSN should have type IGSN (and not DOI or Hdl).

Similarly, an ARK, Purl, ORCID and ROR identifiers should have their corresponding types and should not use the more generic URL identifier.

fully agreed, I added a recommendation along the lines you describe here

paulmillar · 2024-10-07T20:23:25Z

base_classes/NXobject.nxdl.xml

+				</item>
+				<item value="URN">
+					<doc>
+						Uniform Resource Name


We probably should mention that identifiers with more specific type attribute values should not be stored as a URN, even when this is valid.

The two example that springs to mind are DOI and ISSN, but there may be others with valid URN namespaces.

The doi URN namespace has been registered, so the URN doi:10.1107/S1600576714027575 is a valid URN-based representation for the DOI 10.1107/S1600576714027575.

Similarly, the ISSN URN namespace has been registered, so the URN URN:ISSN:1234-1231 is a valid URN that refers to the ISSN 1234-1231.

However, I believe we shouldn't allow DOIs (and IGSNs) and ISSNs to be written using the URN type, but rather with their more specific types.

Added

It is recommended that identifiers with more specific type attribute (such as DOI or ISSN) values should not be stored as a URN, even when this is valid. As an example, the URN doi:10.1107/S1600576714027575 is a valid URN-based representation for the DOI 10.1107/S1600576714027575, but it is strongly recommended to use type="DOI" in this case.

paulmillar · 2024-10-07T20:40:07Z

manual/source/datarules.rst

-``SAS_``      attributes          reserved for use by canSAS                    https://www.cansas.org
-``SILX_``     attributes          reserved for use by silx                      https://www.silx.org
-============  ==================  ============================================  =============================================================
+    reserved prefixes; identifier


It looks like there are some formatting changes.

IMHO, it would be better to separate out layout/formatting changes from content-modifying changes. This patch should add the identifier row, but not modify the existing rows (if possible).

I basically just changed the length of the ============ part to accomodate identifier. Unfortunately, the doc build currently does not work (because of the new nameType="partial"), but we can certainly check afterwards if that formatting change is absolutely neccessary.

lukaspie · 2024-10-16T09:50:58Z

CI/CD failing because multi-line doc handling is not implemented (see #1491)

phyy-nx mentioned this pull request Sep 30, 2024

NXenvironment #1451

Open

This was referenced Oct 1, 2024

FAIRmat 2024: additional base classes in NXinstrument #1419

Draft

Fairmat 2024: use NXidentifier in NXuser #1416

Closed

prjemian reviewed Oct 4, 2024

View reviewed changes

prjemian mentioned this pull request Oct 4, 2024

reserved prefixes are only defined in the documentation #1492

Open

lukaspie mentioned this pull request Oct 7, 2024

Fairmat 2024: several new base classes in NXsample and NXsample_component #1413

Open

paulmillar reviewed Oct 7, 2024

View reviewed changes

lukaspie force-pushed the identifier-in-nxobject branch from f7e3efd to 95bdb03 Compare October 16, 2024 08:32

lukaspie added 5 commits October 17, 2024 15:56

add identifierNAME to NXobject

abce47e

typo fix

5290722

add identifier to reserved prefixes

d00b007

implement review from NXuser PR

0a5a797

use reserved prefixes

00de75b

lukaspie added 4 commits October 17, 2024 15:56

typo fix

c4c91b2

add docs for identifier type

fe5652d

remove is_persistent

8b32d97

more specific examples for identifier types

82e2136

lukaspie force-pushed the identifier-in-nxobject branch from 95bdb03 to 82e2136 Compare October 17, 2024 13:57

small docs reformatting

04416fe

lukaspie marked this pull request as ready for review October 17, 2024 15:03


		This refers specifically to an ID in the Handle system operated by the Corporation for National Research Initiatives (CNRI).

		Syntax: hdl:prefix/identifier


		The IGSN is a unique identifier assigned to a specific sample or specimen in the context of scientific research.

		Syntax: https://igsn.org/{IGSN-ID}


		An ISSN is an 8-digit unique identifier used to distinguish a serial publication, whether in print or electronic form.

		Syntax: ISSN XXXX-XXXX

add identifierNAME to NXobject #1486

Are you sure you want to change the base?

add identifierNAME to NXobject #1486

Conversation

lukaspie commented Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukaspie Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

rayosborn commented Oct 4, 2024 • edited Loading

prjemian commented Oct 4, 2024

rayosborn commented Oct 4, 2024

prjemian commented Oct 4, 2024 via email

lukaspie commented Oct 7, 2024

prjemian commented Oct 7, 2024

rayosborn commented Oct 7, 2024

sanbrock commented Oct 7, 2024

sanbrock commented Oct 7, 2024

sanbrock commented Oct 7, 2024

rayosborn commented Oct 7, 2024

sanbrock commented Oct 7, 2024

sanbrock commented Oct 7, 2024

sanbrock commented Oct 7, 2024

sanbrock commented Oct 7, 2024

sanbrock commented Oct 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukaspie commented Oct 16, 2024

lukaspie commented Sep 30, 2024 •

edited

Loading

lukaspie Oct 4, 2024 •

edited

Loading

rayosborn commented Oct 4, 2024 •

edited

Loading