Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject an additional element with environment information into the input data in IngestReader #149

Merged
merged 11 commits into from
Feb 28, 2024
Merged
5 changes: 5 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ New features
processing the input in custom versions of
:class:`icat.ingest.IngestReader`.

+ `#148`_, `#149`_: Inject an additional element with environment
information into the input data in :class:`icat.ingest.IngestReader`.

+ `#146`_, `#147`_: Better error handling in
:class:`icat.ingest.IngestReader`.

Expand All @@ -40,6 +43,8 @@ Bug fixes and minor changes
.. _#145: https://github.com/icatproject/python-icat/pull/145
.. _#146: https://github.com/icatproject/python-icat/issues/146
.. _#147: https://github.com/icatproject/python-icat/pull/147
.. _#148: https://github.com/icatproject/python-icat/issues/148
.. _#149: https://github.com/icatproject/python-icat/pull/149


.. _changes-1_2_0:
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ include doc/tutorial/*.py
include etc/ingest-*.xsd
include etc/ingest.xslt
include tests/conftest.py
include tests/data/ingest-env.xslt
include tests/data/legacy-icatdump-*.xml
include tests/data/legacy-icatdump-*.yaml
include tests/data/metadata-*.xml
Expand Down
51 changes: 51 additions & 0 deletions doc/src/ingest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,57 @@ objects read from the input file in ICAT.
:show-inheritance:


.. _ingest-process:

Ingest process
--------------

The processing of ingest files during the instantiation of an
:class:`~icat.ingest.IngestReader` object may be summarized with the
following steps:

1. Read the metadata and parse the :class:`lxml.etree._ElementTree`.

2. Call :meth:`~icat.ingest.IngestReader.get_xsd` to get the
appropriate XSD file and validate the metadata against that schema.

3. Inject an ``_environment`` element as first child of the ``data``
element, see below.

4. Call :meth:`~icat.ingest.IngestReader.get_xslt` to get the
appropriate XSLT file and transform the metadata into generic ICAT
data XML file format.

5. Feed the result of the transformation into the parent class
:class:`~icat.dumpfile_xml.XMLDumpFileReader`.

Once this initialization is done,
:meth:`~icat.ingest.IngestReader.ingest` may be called to read the
individual objects defined in the metadata.


.. _ingest-environment:

The environment element
-----------------------

During the processing of ingest files, an ``_environment`` element
will be injected as the first child of the ``data`` element. In the
current version of python-icat, this ``_environment`` element has the
following attributes:

`icat_version`
Version of the ICAT server this client connects to, e.g. the
:attr:`icat.client.Client.apiversion` attribute of the `client`
object being used by this :class:`~icat.ingest.IngestReader`.

More attributes may be added in future versions. This
``_environment`` element may be used by the XSLT in order to adapt the
result of the transformation to the environment, in particular to
adapt the output to the ICAT schema version it is supposed to conform
to.


.. _ingest-example:

Ingest example
Expand Down
2 changes: 2 additions & 0 deletions etc/ingest.xslt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
</icatdata>
</xsl:template>

<xsl:template match="/icatingest/_environment"/>

<xsl:template match="/icatingest/head"/>

<xsl:template match="/icatingest/data">
Expand Down
33 changes: 33 additions & 0 deletions src/icat/ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ class IngestReader(XMLDumpFileReader):
.. versionchanged:: 1.3.0
drop class attribute :attr:`~icat.ingest.IngestReader.XSLT_name`
in favour of :attr:`~icat.ingest.IngestReader.XSLT_Map`.

.. versionchanged:: 1.3.0
inject an element `_environment` as first child of the root
element into the input data.
"""

SchemaDir = Path("/usr/share/icat")
Expand Down Expand Up @@ -110,6 +114,7 @@ def __init__(self, client, metadata, investigation):
schema = etree.XMLSchema(etree.parse(f))
if not schema.validate(ingest_data):
raise InvalidIngestFileError("validation failed")
self.add_environment(client, ingest_data)
with self.get_xslt(ingest_data).open("rb") as f:
xslt = etree.XSLT(etree.parse(f))
super().__init__(client, xslt(ingest_data))
Expand Down Expand Up @@ -180,6 +185,34 @@ def get_xslt(self, ingest_data):
raise InvalidIngestFileError("unknown format")
return self.SchemaDir / xslt

def get_environment(self, client):
"""Get the environment to be injected as an element into the input.

:param client: the client object being used by this
IngestReader.
:type client: :class:`icat.client.Client`
:return: the environment.
:rtype: :class:`dict`

.. versionadded:: 1.3.0
"""
return dict(icat_version=str(client.apiversion))

def add_environment(self, client, ingest_data):
"""Inject environment information into input data.

:param client: the client object being used by this
IngestReader.
:type client: :class:`icat.client.Client`
:param ingest_data: input data
:type ingest_data: :class:`lxml.etree._ElementTree`

.. versionadded:: 1.3.0
"""
env = self.get_environment(client)
env_elem = etree.Element("_environment", **env)
ingest_data.getroot().insert(0, env_elem)

def getobjs_from_data(self, data, objindex):
typed_objindex = set()
for key, obj in super().getobjs_from_data(data, objindex):
Expand Down
14 changes: 14 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,20 @@ def require_dumpfile_backend(backend):
_skip("need %s backend for icat.dumpfile" % (backend))


def get_icatdata_schema():
if icat_version < "4.4":
fname = "icatdata-4.3.xsd"
elif icat_version < "4.7":
fname = "icatdata-4.4.xsd"
elif icat_version < "4.10":
fname = "icatdata-4.7.xsd"
elif icat_version < "5.0":
fname = "icatdata-4.10.xsd"
else:
fname = "icatdata-5.0.xsd"
return gettestdata(fname)


def get_reference_dumpfile(ext = "yaml"):
require_icat_version("4.4.0", "oldest available set of test data")
if icat_version < "4.7":
Expand Down
59 changes: 59 additions & 0 deletions tests/data/ingest-env.xslt
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml"/>

<xsl:template match="/icatingest">
<icatdata>
<xsl:apply-templates/>
</icatdata>
</xsl:template>

<xsl:template match="/icatingest/_environment"/>

<xsl:template match="/icatingest/head">
<head>
<date>2024-01-22T14:30:51+01:00</date>
<apiversion>
<xsl:copy-of select="string(/icatingest/_environment/@icat_version)"/>
</apiversion>
<generator>ingest-env.xslt</generator>
</head>
</xsl:template>

<xsl:template match="/icatingest/data">
<data>
<xsl:apply-templates/>
</data>
</xsl:template>

<xsl:template match="/icatingest/data/dataset">
<dataset>
<xsl:copy-of select="@id"/>
<complete>false</complete>
<xsl:copy-of select="description"/>
<xsl:copy-of select="endDate"/>
<xsl:copy-of select="name"/>
<xsl:copy-of select="startDate"/>
<investigation ref="_Investigation"/>
<xsl:apply-templates select="sample"/>
<type name="raw"/>
<xsl:copy-of select="datasetInstruments"/>
<xsl:copy-of select="datasetTechniques"/>
<xsl:copy-of select="parameters"/>
</dataset>
</xsl:template>

<xsl:template match="/icatingest/data/dataset/sample">
<xsl:copy>
<xsl:attribute name="investigation.ref">_Investigation</xsl:attribute>
<xsl:copy-of select="@*"/>
</xsl:copy>
</xsl:template>

<xsl:template match="*">
<xsl:copy-of select="."/>
</xsl:template>

</xsl:stylesheet>
2 changes: 2 additions & 0 deletions tests/data/myingest.xslt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
</icatdata>
</xsl:template>

<xsl:template match="/myingest/_environment"/>

<xsl:template match="/myingest/head"/>

<xsl:template match="/myingest/data">
Expand Down
Loading
Loading