diff --git a/CHANGES.rst b/CHANGES.rst index 5c6cb3a..cc44a85 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -32,7 +32,7 @@ Incompatible changes Bug fixes and minor changes --------------------------- -+ `#141`_, `#142`_: Review documentation. ++ `#141`_, `#142`_, `#150`_: Review documentation. + `#145`_: Review build tool chain. @@ -45,6 +45,7 @@ Bug fixes and minor changes .. _#147: https://github.com/icatproject/python-icat/pull/147 .. _#148: https://github.com/icatproject/python-icat/issues/148 .. _#149: https://github.com/icatproject/python-icat/pull/149 +.. _#150: https://github.com/icatproject/python-icat/pull/150 .. _changes-1_2_0: diff --git a/doc/src/conf.py b/doc/src/conf.py index 1496d62..3b1fe26 100644 --- a/doc/src/conf.py +++ b/doc/src/conf.py @@ -68,8 +68,12 @@ def make_meta_rst(last_release): extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx', - 'sphinx_copybutton', ] +try: + import sphinx_copybutton + extensions.append('sphinx_copybutton') +except ImportError: + pass # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] @@ -80,6 +84,17 @@ def make_meta_rst(last_release): # The master toctree document. master_doc = 'index' +# Enable automatic numbering of figures, tables and code-blocks +numfig = True + +# Strings to format figure, table, code-block, and section numbers +numfig_format = { + 'figure': "Figure %s", + 'table': "Table %s", + 'code-block': "Snippet %s", + 'section': "Section %s", +} + # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # @@ -102,7 +117,10 @@ def make_meta_rst(last_release): # -- Options for intersphinx extension --------------------------------------- -intersphinx_mapping = {'python': ('https://docs.python.org/3', None)} +intersphinx_mapping = { + 'python': ('https://docs.python.org/3', None), + 'lxml': ('https://lxml.de/apidoc/', None), +} # -- Options for HTML output ------------------------------------------------- diff --git a/doc/src/config.rst b/doc/src/config.rst index af59721..14688e5 100644 --- a/doc/src/config.rst +++ b/doc/src/config.rst @@ -171,9 +171,9 @@ A few derived variables are also set in | `promptPass` | ``-P``, ``--prompt-pass`` | | :const:`False` | no | \(3),(4),(5) | +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -See the table for an overview of predefined configuration variables. -Mandatory means that an error will be raised in -:meth:`icat.config.Config.getconfig` if no value is found for the +See :numref:`tab-config-vars` for an overview of predefined +configuration variables. Mandatory means that an error will be raised +in :meth:`icat.config.Config.getconfig` if no value is found for the configuration variable in question. Notes: diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 878c87f..c4894c0 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -62,9 +62,8 @@ corresponding Grouping objects. References to ICAT objects and unique keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -References to related objects are encoded in ICAT data files with -reference keys. There are two kinds of those keys, local keys and -unique keys: +References to ICAT objects may be encoded using reference keys. There +are two kinds of those keys, local keys and unique keys: When an ICAT object is defined in the file, it generally defines a local key at the same time. Local keys are stored in the object index @@ -86,12 +85,6 @@ Reference keys should be considered as opaque ids. ICAT data XML files ~~~~~~~~~~~~~~~~~~~ -In this section we describe the ICAT data file format using the XML -backend. Consider the following example: - -.. literalinclude:: ../examples/icatdump-simple.xml - :language: xml - The root element of ICAT data XML files is ``icatdata``. It may optionally have one ``head`` subelement and one or more ``data`` subelements. @@ -100,108 +93,158 @@ The ``head`` element will be ignored by :ref:`icatingest`. It serves to provide some information on the context of the creation of the data file, which may be useful for debugging in case of issues. -The content of each ``data`` element is one chunk, its subelements are -the ICAT object definitions according to the logical structure -explained above. The present example contains two chunks: the first -chunk contains four User objects and three Grouping objects. The -Groupings include related UserGroups. The second chunk only contains -one Investigation, including related InvestigationGroups. - -The object elements may have an ``id`` attribute that define a local -key to reference the object later on. The subelements of the object -elements correspond to the object's attributes and relations according -to the ICAT schema. All many-to-one relations must be provided and -reference already existing objects, e.g. they must either already have -existed before starting the ingestion or appear earlier in the ICAT -data file than the referencing object, so that they will be created -earlier. The related object may either be referenced by reference key -using the ``ref`` attribute or by the related object's attribute -values, using XML attributes of the same name. In the latter case, -the attribute values must uniquely define the related object. - -Consider a simplified version of the first chunk from the present -example, defining only one User, Grouping and UserGroup respectively: - -.. code-block:: XML - - - - Goethe University Frankfurt, Faculty of Philosophy and History - ahau@example.org - Hau - Arnold Hau - Arnold - db/ahau - 0000-0002-3263 - - - investigation_10100601-ST_owner - - - - - - -The Grouping includes the related UserGroup object that in turn -references the related User. This User is referenced in the ``ref`` -attribute using a local key defined in the User's ``id`` attribute. -Note that the UserGroup does not include its relation with Grouping. -The latter relationship is implied by the parent relation of the -object in the file. - -As an alternative, the UserGroup could have been added to the file as -separate object as direct subelement of ``data``: - -.. code-block:: XML - - - - Goethe University Frankfurt, Faculty of Philosophy and History - ahau@example.org - Hau - Arnold Hau - Arnold - db/ahau - 0000-0002-3263 - - - investigation_10100601-ST_owner - - - - - - - -Another example is how the Investigation references its Facility: - -.. code-block:: XML - - - - - - +The actual payload of an ICAT data XML file is in the ``data`` +elements. There can be any number of them and each is one chunk +according to the logical structure explained above. The subelements +of ``data`` may either be ICAT object references or ICAT object +definitions, both explained in detail below. Either of them may have +an ``id`` attribute that defines a local key that allows to reference +the corresponding object later on. -The Facility is not defined in the data file. It is assumed to exist -in ICAT before ingesting the file. In this case, it must be -referenced by its unique key. Alternatively, the Facility could have -been referenced by attribute as in: +:numref:`snip-file-icatdata-xml-1` shows a simple example for an ICAT +data XML file having one single ``data`` element that defines four +Datasets. .. code-block:: XML + :name: snip-file-icatdata-xml-1 + :caption: A simple example for an ICAT data XML file + :dedent: 2 + + + + + 2023-10-17T07:33:36Z + manual edit + + + + + false + 2012-07-30T01:10:08+00:00 + e209001 + 2012-07-26T15:44:24+00:00 + + + + + + false + 2012-08-06T01:10:08+00:00 + e209002 + 2012-08-02T05:30:00+00:00 + + + + + + false + 2012-07-16T14:30:17+00:00 + e209003 + 2012-07-16T11:42:05+00:00 + + + + + + false + 2012-07-31T22:52:23+00:00 + e209004 + 2012-07-31T20:20:37+00:00 + + + + + + +ICAT object references +...................... + +ICAT object references do not define an ICAT object to be created when +reading the ICAT data file but reference an already existing one. It +is either assumed to exist in ICAT before ingesting the file or it +must appear earlier in the ICAT data file, so that it will be created +before the referencing object is read. + +ICAT objects may either be referenced by reference key or by +attributes. A reference key should be included as a ``ref`` +attribute. + +When referencing the object by attributes, these attributes should be +included using the same name in the reference element. This may also +include attributes of related objects using the same dot notation as +for ICAT JPQL search expressions. Referencing by attributes may be +combined with referencing related objects by reference key, using +``ref`` in place of the related object's attribute names. In any +case, referenced objects must be uniquely defined by the attribute +values. + +ICAT object references may be used in two locations in ICAT data XML +files: as direct subelements of ``data`` or to reference related +objects in many-to-one relations in ICAT object definitions, see +below. In the former case, the name of the object reference element +is the name of the corresponding ICAT entity type (the first letter in +lowercase) with a ``Ref`` suffix appended. In that case, the element +should have an ``id`` attribute that will define a local key that can +be used to reference that object in subsequent object references. +This is convenient to define a shortcut when the same object needs to +be referenced often, to avoid having to repeat the same set of +attributes each time. + +In any case, object reference elements only have attributes, but no +content or subelements. + +See :numref:`snip-file-icatdata-xml-1` for a few examples: the first +subelement of the ``data`` element in this case is +``investigationRef``. It references a (supposed to be existing) +Investigation by its attributes ``name`` and ``visitId``. It defines +a local key for that Investigation object in the ``id`` attribute. +The Dataset object definitions in that example each use that local key +to set their relation with the Investigation respectively. The +Dataset object definitions each also include a relation with their +``type``, referencing the related DatasetType by the ``name`` +attribute. Some of the Dataset object definitions also include a +relation with a Sample. The respective Sample object is referenced by +``name`` and the related Investigation. The latter is referenced by +the local key defined earlier in the ``investigation.ref`` attribute. + +ICAT object definitions +....................... + +ICAT object definitions define objects that will be created in ICAT +when ingesting the ICAT data file. As direct subelements of ``data``, +the name of the element must be the name of the corresponding entity +type in the ICAT schema (the first letter in lowercase). + +The subelements of ICAT object definitions are the attributes and +object relations as defined in the ICAT schema using the same names. +Attributes must include the corresponding value as text content of the +element. All many-to-one relations must be provided as ICAT object +references, see above. + +The ICAT object definitions may include one-to-many relations as +subelements. In this case, these subelements must in turn be ICAT +object definitions for the related objects. These related objects +will be created along with the parent in one single cascading call. +The object definition for the related object must not include its +relation with the parent object as this is already implied by the +parent and child relationship. + +When appearing as direct subelements of ``data``, ICAT object +definitions may have an ``id`` attribute that will define a local key +that can be used to reference the defined object later on. - - - - - - -The Investigation in the second chunk in the present example includes -related InvestigationGroups that will be created along with the -Investigation. The InvestigationGroup objects include a reference to -the corresponding Grouping respectively. Note that these references -go across chunk boundaries. Thus, unique keys for the Groupings need -to be used here. +.. literalinclude:: ../examples/icatdump-simple.xml + :language: xml + :name: snip-file-icatdata-xml-2 + :caption: An example for an ICAT data XML file + +Consider the example in :numref:`snip-file-icatdata-xml-2`. It +contains two chunks: the first chunk contains four User objects and +three Grouping objects. The Groupings include related UserGroups. +Note that these UserGroups include their relation to the User, but not +their relation with Grouping. The latter is implied by the parent +relation of the object in the file. The second chunk only contains +one Investigation, including related InvestigationGroups. Finally note that the file format also depends on the ICAT schema version: the present example can only be ingested into ICAT server 5.0 @@ -222,11 +265,14 @@ ICAT data YAML files ~~~~~~~~~~~~~~~~~~~~ In this section we describe the ICAT data file format using the YAML -backend. Consider the following example, it corresponds to the same -ICAT content as the XML example above: +backend. Consider the example in :numref:`snip-file-icatdata-yaml`, +it corresponds to the same ICAT content as the XML in +:numref:`snip-file-icatdata-xml-2`: .. literalinclude:: ../examples/icatdump-simple.yaml :language: yaml + :name: snip-file-icatdata-yaml + :caption: An example for an ICAT data YAML file ICAT data YAML files start with a head consisting of a few comment lines, followed by one or more YAML documents. YAML documents are @@ -236,13 +282,14 @@ file, which may be useful for debugging in case of issues. Each YAML document defines one chunk of data according to the logical structure explained above. It consists of a mapping having the name -of entity types in the ICAT schema as keys. The values are in turn -mappings that map object ids as key to ICAT object definitions as -value. These object ids define local keys that may be used to -reference the respective object later on. In the present example, the -first chunk contains four User objects and three Grouping objects. -The Groupings include related UserGroups. The second chunk only -contains one Investigation, including related investigationGroups. +of entity types in the ICAT schema (the first letter in lowercase) as +keys. The values are in turn mappings that map object ids as key to +ICAT object definitions as value. These object ids define local keys +that may be used to reference the respective object later on. In the +present example, the first chunk contains four User objects and three +Grouping objects. The Groupings include related UserGroups. The +second chunk only contains one Investigation, including related +investigationGroups. Each of the ICAT object definitions corresponds to an object in the ICAT schema. It is again a mapping with the object's attribute and @@ -251,43 +298,21 @@ relations must be provided and reference existing objects, e.g. they must either already have existed before starting the ingestion or appear in the same or an earlier YAML document in the ICAT data file. The values of many-to-one relations are reference keys, either local -keys defined in the same YAML document or unique keys. +keys defined in the same YAML document or unique keys. Unlike the XML +backend, the YAML backend does not support referencing objects by +attributes. The object definitions may include one-to-many relations. In this case, the value for the relation name is a list of object definitions for the related objects. These related objects will be created along with the parent in one single cascading call. In the present example, the Grouping objects include their related UserGroup objects. Note -that these UserGroups include their relation to the User, but not -their relation with Grouping. The latter relationship is implied by -the parent relation of the object in the file. - -As an alternative, in the present example, the UserGroups could have -been added to the file as separate objects as in: - -.. code-block:: YAML - - --- - grouping: - Grouping_name-investigation=5F10100601=2DST=5Fowner: - name: investigation_10100601-ST_owner - user: - User_name-db=2Fahau: - affiliation: Goethe University Frankfurt, Faculty of Philosophy and History - email: ahau@example.org - familyName: Hau - fullName: Arnold Hau - givenName: Arnold - name: db/ahau - orcidId: 0000-0002-3263 - userGroup: - UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner): - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner - user: User_name-db=2Fahau - --- - -Note that the entries in the mappings have no inherent order. The -:ref:`icatingest` script uses a predefined order to read the ICAT +that these UserGroups include their relation to the User, but not with +Grouping. The latter relationship is implied by the parent relation +of the object in the file. + +Note that the entries in the mappings in YAML have no inherent order. +The :ref:`icatingest` script uses a predefined order to read the ICAT entity types in order to make sure that referenced objects are created before any object that may reference them. diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index 7348259..e6e1c25 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -26,7 +26,7 @@ format if we would use plain ICAT data files for this purpose. ingest file format is 1.1. .. versionchanged:: 1.2.0 - add metadata ingest file format version 1.1: add support for + add metadata ingest file format version 1.1, adding support for relating Datasets with Samples. Differences compared to ICAT data XML files diff --git a/doc/src/fileformats.rst b/doc/src/fileformats.rst index c90eaec..c24d37e 100644 --- a/doc/src/fileformats.rst +++ b/doc/src/fileformats.rst @@ -1,8 +1,22 @@ File formats ============ -Some components of python-icat read input files or write output files. -This section describes the file formats being used. +Some components of python-icat read input files or write output files: + +The :ref:`icatdump` command line script fetches content from an ICAT +server and writes it to a file. The :ref:`icatingest` command line +script reads those files and restores the content in an ICAT server. +The ICAT data file format written and read by these scripts +respectively corresponds directly to the ICAT schema. It is rather +generic and may encode any ICAT content. + +The metadata ingest file format is basically a restricted version of +the ICAT data file format. It is read by class +:class:`icat.ingest.IngestReader` for the purpose of ingesting +metadata created by experiments into ICAT. + +See the following sections for a detailed description of these file +formats: .. toctree:: :maxdepth: 1 diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst index 8245e0f..e057422 100644 --- a/doc/src/ingest.rst +++ b/doc/src/ingest.rst @@ -49,8 +49,8 @@ objects read from the input file in ICAT. Ingest process -------------- -The processing of ingest files during the instantiation of an -:class:`~icat.ingest.IngestReader` object may be summarized with the +The processing of the metadata during the instantiation of an +:class:`~icat.ingest.IngestReader` object may be summarized by the following steps: 1. Read the metadata and parse the :class:`lxml.etree._ElementTree`. @@ -58,7 +58,7 @@ following steps: 2. Call :meth:`~icat.ingest.IngestReader.get_xsd` to get the appropriate XSD file and validate the metadata against that schema. -3. Inject an ``_environment`` element as first child of the ``data`` +3. Inject an ``_environment`` element as first child of the root element, see below. 4. Call :meth:`~icat.ingest.IngestReader.get_xslt` to get the @@ -78,8 +78,8 @@ individual objects defined in the metadata. The environment element ----------------------- -During the processing of ingest files, an ``_environment`` element -will be injected as the first child of the ``data`` element. In the +During the processing of the metadata, an ``_environment`` element +will be injected as the first child of the root element. In the current version of python-icat, this ``_environment`` element has the following attributes: diff --git a/src/icat/ingest.py b/src/icat/ingest.py index 0b8f2e8..2540535 100644 --- a/src/icat/ingest.py +++ b/src/icat/ingest.py @@ -75,7 +75,7 @@ class IngestReader(XMLDumpFileReader): in favour of :attr:`~icat.ingest.IngestReader.XSLT_Map`. .. versionchanged:: 1.3.0 - inject an element `_environment` as first child of the root + inject an element ``_environment`` as first child of the root element into the input data. """ @@ -188,6 +188,9 @@ def get_xslt(self, ingest_data): def get_environment(self, client): """Get the environment to be injected as an element into the input. + Subclasses may override this method to control the attributes + set in the environment. + :param client: the client object being used by this IngestReader. :type client: :class:`icat.client.Client` @@ -201,6 +204,11 @@ def get_environment(self, client): def add_environment(self, client, ingest_data): """Inject environment information into input data. + The attributes set in the environment are determined by + calling :meth:`~icat.ingest.IngestReader.get_environment`. + Subclasses may override this method to fully control the + process of adding the environment element. + :param client: the client object being used by this IngestReader. :type client: :class:`icat.client.Client` @@ -244,11 +252,16 @@ def ingest(self, datasets, dry_run=False, update_ds=False): created in ICAT. In this case, the `datasets` in the argument must already have been created in ICAT beforehand (e.g. the `id` attribute must be set). If `dry_run` is :const:`True`, - the `datasets` don't need to be created beforehand. + the objects in the metadata will be checked for conformance, + but nothing will be committed to ICAT. In this case, the + `datasets` don't need to be created beforehand. if `update_ds` is :const:`True`, the objects in the `datasets` argument will be updated: the attributes and the relations to other objects will be set to the values read from the input. + This is particularly useful in conjunction with `dry_run` in + order to update the `datasets` from the metadata prior to + creating them in ICAT. :param datasets: list of allowed datasets in the input. :type datasets: iterable of :class:`icat.entity.Entity`