Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For discussion - add identifier scheme for pids #299

Open
agbeltran opened this issue Aug 25, 2022 · 1 comment
Open

For discussion - add identifier scheme for pids #299

agbeltran opened this issue Aug 25, 2022 · 1 comment

Comments

@agbeltran
Copy link
Member

We have now pid fields for the relevant entities. These pids may be from different schemes - for example, for affiliations we may have ROR or ISNI identifiers. If facilities rely on more than one scheme, it would be useful to include a field for the ``pid_scheme``` being used.

@RKrahl
Copy link
Member

RKrahl commented Nov 9, 2022

I don't believe we need to have such a pid_scheme attribute for this. A proper use of the respective pid attribute is sufficient to disambiguate.

At HZB, I use the convention to always add a scheme prefix separated by a colon into the pid or doi value. To give an example, the ICAT content for one of our data publications looks like (output trimmed for brevity):

>>> query = Query(client, "DataPublication", conditions={"pid": "= 'DOI:10.5442/ND000006'"}, includes=["fundingReferences.funding", "relatedItems", "users.affiliations"])
>>> client.assertedSearch(query)[0]
(dataPublication){
   # …
   description = "…"
   fundingReferences[] = 
      (dataPublicationFunding){
         # …
         funding = 
            (fundingReference){
               # …
               awardNumber = "ExNet-0042-Phase-2-3"
               funderIdentifier = "Crossref Funder ID:10.13039/501100001656"
               funderName = "Helmholtz Association"
            }
      },
      (dataPublicationFunding){
         # …
         funding = 
            (fundingReference){
               # …
               awardNumber = ":unas"
               funderName = "Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS)"
            }
      },
      (dataPublicationFunding){
         # …
         funding = 
            (fundingReference){
               # …
               awardNumber = "0324247"
               funderIdentifier = "Crossref Funder ID:10.13039/501100006360"
               funderName = "Federal Ministry for Economic Affairs and Energy"
            }
      },
   pid = "DOI:10.5442/ND000006"
   publicationDate = 2021-06-28 00:00:00+02:00
   relatedItems[] = 
      (relatedItem){
         # …
         fullReference = "Cariou, Romain et al. III–V-on-silicon solar cells reaching 33% photoconversion efficiency in two-terminal configuration. Nat Energy 3, 326–333 (2018). https://doi.org/10.1038/s41560-018-0125-0"
         identifier = "DOI:10.1038/s41560-018-0125-0"
         relatedItemType = "JournalArticle"
         relationType = "Cites"
         title = "III–V-on-silicon solar cells reaching 33% photoconversion efficiency in two-terminal configuration"
      },
      (relatedItem){
         # …
         fullReference = "Bläsi, Benedikt et al. Photonic structures for III-V//Si multijunction solar cells with efficiency >33%. Proc. SPIE 10688, Photonics for Solar Energy Systems VII, 1068803 (2018). https://doi.org/10.1117/12.2307831"
         identifier = "DOI:10.1117/12.2307831"
         relatedItemType = "JournalArticle"
         relationType = "Cites"
         title = "Photonic structures for III-V//Si multijunction solar cells with efficiency >33%"
      },
      (relatedItem){
         # …
         fullReference = "Tillmann, Peter et al (2021): Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells. Optics Express. https://doi.org/10.1364/OE.426761"
         identifier = "DOI:10.1364/OE.426761"
         relatedItemType = "JournalArticle"
         relationType = "IsSupplementTo"
         title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
      },
      (relatedItem){
         # …
         fullReference = "Tillmann, Peter et al (2021): Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells. Zenodo. https://doi.org/10.5281/zenodo.5013230"
         identifier = "DOI:10.5281/zenodo.5013230"
         relatedItemType = "Software"
         relationType = "IsReferencedBy"
         title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
      },
   subject = "multi-junction solar cell; optical simulations; finite element method; light trapping; light management; nanotextures; metal grating"
   title = "Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells"
   users[] = 
      # …
       (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "JCMwave GmbH, Bolivarallee 22, 14050 Berlin"
               name = "01: JCMwave"
            },
            (affiliation){
               # …
               fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
               name = "02: ZIB"
               pid = "ROR:02eva5865"
            },
         contributorType = "Creator"
         familyName = "Hammerschmidt"
         fullName = "Hammerschmidt, Martin"
         givenName = "Martin"
         orderKey = "004"
      },
      (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "Optics for Solar Energy, Helmholtz-Zentrum Berlin für Materialien und Energie, Albert-Einstein-Straße 16, 12489 Berlin"
               name = "01: HZB"
               pid = "ROR:02aj13c28"
            },
            (affiliation){
               # …
               fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
               name = "02: ZIB"
               pid = "ROR:02eva5865"
            },
         contributorType = "Creator"
         familyName = "Tillmann"
         fullName = "Tillmann, Peter"
         givenName = "Peter"
         orderKey = "001"
      },
      (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "Fraunhofer Institute for Solar Energy Systems ISE, Heidenhofstr. 2, 79110 Freiburg, Germany"
               name = "01: Fraunhofer ISE"
               pid = "ROR:02kfzvh91"
            },
         contributorType = "Creator"
         familyName = "Bläsi"
         fullName = "Bläsi, Benedikt"
         givenName = "Benedikt"
         orderKey = "002"
      },
      (dataPublicationUser){
         # …
         affiliations[] = 
            (affiliation){
               # …
               fullReference = "JCMwave GmbH, Bolivarallee 22, 14050 Berlin"
               name = "01: JCMwave"
            },
            (affiliation){
               # …
               fullReference = "Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin"
               name = "02: ZIB"
               pid = "ROR:02eva5865"
            },
         contributorType = "Creator"
         familyName = "Burger"
         fullName = "Burger, Sven"
         givenName = "Sven"
         orderKey = "003"
      },
 }

As you can see, I have mutliple different types of PIDs in the data: DOIs, Crossref Funder IDs, and RORs in this case. Note that the Crossref Funder IDs are actually DOIs, but still handled separately.

The script that generates the landing pages has a helper class to deal with that:

class PID:
    """Generalization of a persistent identifier.
    """

    SchemeURI = {
        "DOI": "https://doi.org/",
        "arXiv": "https://arxiv.org/abs/",
        "ORCID": "https://orcid.org/",
        "ROR": "https://ror.org/",
        "Crossref Funder ID": "https://doi.org/",
        "PaNET": "http://purl.org/pan-science/PaNET/",
        "URL": "",
    }

    def __init__(self, identifier, scheme=None):
        # Unless the scheme is overridden, this code assumes the
        # identifier to be scheme and id separated by a colon and that
        # the scheme part does not contain a colon.
        if scheme:
            self._type, self._id = scheme, identifier
        else:
            self._type, self._id = identifier.split(':', maxsplit=1)
        if self._type not in self.SchemeURI:
            raise ValueError("%s: unknown identifier type" % identifier)

    @property
    def identifierType(self):
        return self._type

    @property
    def identifier(self):
        return self._id

    @property
    def schemeURI(self):
        return self.SchemeURI[self._type] or None

    @property
    def url(self):
        return self.SchemeURI[self._type] + self._id

This helper is able to deal properly with all different types and cases:

>>> p = PID("Crossref Funder ID:10.13039/501100001656")
>>> p.identifierType
'Crossref Funder ID'
>>> p.identifier
'10.13039/501100001656'
>>> p.schemeURI
'https://doi.org/'
>>> p.url
'https://doi.org/10.13039/501100001656'
>>> p = PID("DOI:10.5442/ND000006")
>>> p.identifierType
'DOI'
>>> p.identifier
'10.5442/ND000006'
>>> p.schemeURI
'https://doi.org/'
>>> p.url
'https://doi.org/10.5442/ND000006'
>>> p = PID("ROR:02eva5865")
>>> p.identifierType
'ROR'
>>> p.identifier
'02eva5865'
>>> p.schemeURI
'https://ror.org/'
>>> p.url
'https://ror.org/02eva5865'

E.g. the snippet for adding relatedIdentifiers to DataCite XML used for the landing pages looks like:

if self.relatedItems:
    relatedIds = etree.SubElement(datacite, "relatedIdentifiers")
    for r in self.relatedItems:
        pid = PID(r['identifier'])
        rId = etree.SubElement(relatedIds, "relatedIdentifier")
        rId.set("relatedIdentifierType", pid.identifierType)
        rId.set("relationType", r['relationType'])
        rId.text = pid.identifier

It works the same for any PID type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants