Fix: 1381 change to the axes attribute meaning #1396

woutdenolf · 2024-06-30T14:31:43Z

HTML RENDERING OF NXDATA FIX <-> current nxdata for comparison

When reviewing this PR, please keep in mind that the purpose is to rectify NXdata, not improve it. Suggestions for improvements can go in #1381.

For reference: multi-dimensional axes were introduced here https://www.nexusformat.org/2014_axes_and_uncertainties.html

Context

The NXdata definition got a makeover in PR #1213 to make it more understandable. In this effort, I assumed the @axes attribute was supposed to say what all the axes are of the NXdata group. It didn't occur to me this attribute is not about the axes, it is about the signal. It defines what the default axis is for each signal dimension. The unintended change do not make existing files invalid but it does introduce more flexibility which changes things for existing readers as pointed out in issue #1381 by @jacobfilik .

Purpose of this PR

The sole purpose of this PR is to rectify PR #1213 and NOT introduce anything new. The alternative PR #1392 by @PeterC-DLS fixes the situation by carefully modifying some sentences here and there. I would argue however that the entire "axes" section in the introduction has been structured with the less restrictive @axes in mind. In this PR I refactor the entire "axes" section to better reflect restrictive @axes.

Details on @axes rectification

This is the current less restrictive @axes attribute definition which I'm rectifying

The `@axes` attribute provides the names of all the `AXISNAME` fields in the NXdata group.
The corresponding `@AXISNAME_indices` attributes provide the signal dimensions they span
and when missing the position(s) of `AXISNAME` in `@axes` are taken as the signal dimensions
spanned by `AXISNAME`. The shape of an `AXISNAME` field is required to be equal to the shape
of the spanned signal dimensions.

This definition is much simpler and more concise than the definition in the "axes" section in this PR. However it removes the restriction that length(axes) == rank(signal) so we need to put it back in with @axes being the default axes, not all axes.

base_classes/NXdata.nxdl.xml

… directly

PeterC-DLS · 2024-07-17T12:00:04Z

Also add to @axes the following from my #1392

    <dimensions rank="1">
        <dim index="1" value="dataRank"/>
    </dimensions>

base_classes/NXdata.nxdl.xml

@axes

…s +1, indices can be omitted, indices cannot contradict @axes

- NX_FLOAT -> NX_NUMBER for scaling_factor and offset - Sync language with number #1396 for the FIELDNAME convention

base_classes/NXdata.nxdl.xml

PeterC-DLS · 2024-08-20T08:28:08Z

base_classes/NXdata.nxdl.xml

+			.. code-block::
+
+				data:NXdata
+				  @signal = "data"


@signal is optional and if missing, we use data so maybe simplest example should omit it.

I did not know this. Not specified in the definition. I'll add it.

I guess when @axes is missing, we use [".", ".", ...] (as many dots as rank)?

If @signal is missing, there is no rule that says "data" is required as a field name. In this case and multiple fields, selection of the signal field for the default plot is not guaranteed.

I thought NXdata evolved from a fixed signal field (usually called data, see e.g.

definitions/base_classes/NXdata.nxdl.xml

Line 152 in 04bfe4c

<field name="data" type="NX_NUMBER">

) with an (optional?) attribute @signal=1 so to be backwardly compatible, a modern parser could assume the group attribute signal if missing is "data".

I think that @prjemian is correct. I just scanned through the minutes of all the NIAC meetings, and I can't find any reference to a default name for the signal. I believe that the current (non-deprecated) method of defining signals and axes requires both signal and axes attributes to be defined at group level.

It was found that assigning these links to fields would break when the fields were linked (local or external) into other groups. Such field attributes could produce a group with multiple field @signal attributes.

Perhaps we need to make these attributes required then.

Could bring that up again for a NIAC vote. Should not make it required until then, since NIAC voted the other way.

Consider a simple use case (for not having this attribute). User wants to save a 2-D image, only the 2-D image. NeXus file consists of /entry:NXentry/data:NXdata/image where image is the 2-D image. We've gotten strong pushback on requiring @signal in this use case.

PeterC-DLS · 2024-08-20T09:09:04Z

base_classes/NXdata.nxdl.xml

-		`Dimensions`
+		4. When :ref:`AXISNAME_indices &lt;/NXdata@AXISNAME_indices-attribute&gt;` is the same as the indices of "AXISNAME" in the
+		   :ref:`axes &lt;/NXdata@axes-attribute&gt;` attribute, there is no need to provide
+		   :ref:`AXISNAME_indices &lt;/NXdata@AXISNAME_indices-attribute&gt;`.


Only for 1D axis fields?

If this applies in general then @axes=["x", "x"] would imply @x_indices=[0,1]. That would be ok when x is 2D.

When x is 1D it would not work. It would violate two requirements

The number of values in AXISNAME_indices must be equal to the rank of the corresponding AXISNAME field.

The indices of "AXISNAME" in the axes attribute must be a subset of AXISNAME_indices.

So if a 1D x needs to work for @axes=["x", "x"] we need to change those requirements. Edit: for the record I don't think this is needed and if it would be needed, you can use softlinks to have one dataset with two different names.

Hmm, a pathological case could be @x_indices=[1,0]

Ok I see. So omitting indices can only work unambiguously in 1D cases.

Another pathological case would be @x_indices=[0,0]. I'm not sure what this means but technically the definition does not forbid it. So we should say indices cannot contain duplicates?

Eeeeww! That's pathological, and likely a mistake.

Ah ok, the first example. axes=["x", "x"] would be equivalent to @x_indices=[0,1] no?

Eeeeww! That's pathological, and likely a mistake.

Agreed but technically the definition does not present it as far as I can see, which we should probably fix but not sure how.

Stating that AXISNAME_indices cannot have duplicates would solve it. But it would prohibit the case where you have for example a single 1D x axis of lets say 100 points and you have a 2D signal of 100 x 100 points and you want x to be used for both dimensions. As I said before, you could define x1 and x2 and both can be symlinks to x so effectively x is used for both dimensions even though @axes=["x1", "x2"].

@prjemian Can you modify the code provided in #1396 (comment) to prove that the current specs allow a 1D signal to be used for multiple dimensions without breaking support for multi-dimensional axes?

I don't think it is possible.

Not sure I know how to proceed on this directly. For the variety of different formats considered by canSAS, the discussion resulted in this documentation where the examples are repeated. Those are the places I would examine to test multi-dimensional data.

base_classes/NXdata.nxdl.xml

rayosborn · 2024-08-20T14:33:46Z

I would suggest changing the following text:

The AXISNAME_indices attributes describe the DATA dimensions spanned by the corresponding AXISNAME fields.

to:

The optional AXISNAME_indices attributes describe the DATA dimensions spanned by the corresponding AXISNAME fields. These attributes may be used to indicate alternative axis coordinates to the default.

My rationale is that the fact that these attributes are not necessarily needed may otherwise be missed by those who don't read the following points carefully.

rayosborn · 2024-08-20T14:37:33Z

I would suggest changing histogram bin edges to histogram bin boundaries, which is more common usage in the neutron time-of-flight community.

rayosborn · 2024-08-20T14:46:52Z

Just a general comment. I think @woutdenolf has done a great job of clarifying the various scenarios for specifying axes, but I found the list following "Additional requirements for a complete definition" to be difficult to follow, even as someone who has participated in this discussion for years. I wonder if it should have a separate heading, such as "Multidimensional Axes" so that people know they only need to get their heads round this if they have specific needs for more complex axis specifications. To emphasize this, I think it might be better if the simple example at the top is two-dimensional, with both x and y axes, to prevent people from thinking that multidimensional data are intrinsically complicated to plot. The only drawback is that it opens a can of worms about what is the horizontal axis, x or y, when axes=["x", "y"].

woutdenolf · 2024-08-20T18:08:38Z

NIAC Aug 2024 comments: target audience are scientists on the one hand and software developers that need to write/read HDF5 files on the other hand. The rigorous definition should not alienate the first group. Split the description at the top accordingly.

woutdenolf · 2024-09-27T11:57:05Z

As an aside for @rayosborn : code in Silx that takes coordinates and signals and makes an image from it https://github.com/silx-kit/silx/blob/11919895f51353c8467b85a3753f142778c2bbaf/src/silx/gui/plot/items/scatter.py#L547-L797

woutdenolf · 2024-09-27T21:43:58Z

Suggestion to progress: this is how the NXdata docs are typically consumed by a human

First you want to understand what NXdata can do.
Then you need to write a script or library to read/write NXdata based on the spec.

For the first step we could start the docs with several examples (image + nexus structure). I need suggestions before I make the effort to generate the data. The examples need to be simple but we also want to make sure we don't guide the user in the wrong direction. For example from the image example one may assume equal coordinate spacing due to the square pixels but the data. But you have a 1D array of coordinates for an axis (N values), not an offset and pixel size (2 values). So unless you convert the N values in 2 values, you cannot assume regular spacing.

We can take inspiration from the matplotlib docs. A quick aside, with 1D axes we are doing rectilinear grids.

For the second use case we need a spec that does not leave room for ambiguity. The current spec is too complex and needs a better formulation (especially separating what is asserted from what is inferred). I propose to express the current spec in code and let NIAC members check it before investigating different way of describing the spec.

from itertools import combinations
import numpy
import h5py


def nxdata_coordinates(nxdata: h5py.Group) -> None:
    """Illustrate the axes specs. Not written for efficiency."""

    # Get the values of the dependent variable.

    nxdata_attrs = dict(nxdata.attrs)
    signal = nxdata_attrs.get("signal")
    if signal is None:
        raise RuntimeError("NXdata does not have a signal")
    data = nxdata[signal]
    data_rank = data.ndim
    data_shape = data.shape

    # The length of the axes attribute must be equal to the data rank.
    axes_attr = nxdata_attrs.get("axes", ["."] * data_rank)
    if len(axes_attr) != data_rank:
        raise ValueError(f"{len(axes_attr)} default axes but the rank is ({data_rank})")

    # All axis names

    axisnames_with_explicit_dims = {
        name[:-8]
        for name in nxdata_attrs
        if name.endswith("_indices")
        and name[:-8] in nxdata
        and isinstance(nxdata[name[:-8]], h5py.Dataset)
    }

    axisnames = set(axes_attr) - {"."} | axisnames_with_explicit_dims

    # Map axis names to the dimensions they span.

    axisname_to_dimensions: dict[str, list[int]] = {}

    for axisname in axisnames:
        axis_shape = nxdata[axisname].shape

        default_dimensions = [i for i, name in enumerate(axes_attr) if name == axisname]

        if axisname in axisnames_with_explicit_dims:
            axisname_indices = nxdata_attrs[f"{axisname}_indices"]
            if isinstance(axisname_indices, numpy.ndarray):
                dimensions = axisname_indices.tolist()
            else:
                dimensions = [axisname_indices]

            if len(set(axisname_indices)) < len(axisname_indices):
                raise ValueError(
                    f"'{axisname}_indices' cannot have duplicate dimensions"
                )

            if not set(default_dimensions).issubset(dimensions):
                raise ValueError(
                    f"default dimensions spanned by axis '{axisname}' are {default_dimensions} but the explicit dimensions are {axisname_indices}"
                )
        else:
            dimensions = default_dimensions

        axisname_to_dimensions[axisname] = dimensions

    # Map dimensions to the axis names that span them.

    dimensions_to_axisnames: dict[int, list[str]] = {
        dimension: [] for dimension in range(data_rank)
    }
    for axisname, dimensions in axisname_to_dimensions.items():
        for dimension in dimensions:
            is_default = axisname == axes_attr[dimension]
            if is_default:
                dimensions_to_axisnames[dimension].insert(0, f"{axisname}*")
            else:
                dimensions_to_axisnames[dimension].append(axisname)

    # Axis dimensions must be equal to the data dimensions they span.

    for axisname, dimensions in axisname_to_dimensions.items():
        axis_shape = nxdata[axisname].shape
        spanned_shape = tuple(data_shape[dimension] for dimension in dimensions)
        if axis_shape not in _possible_shapes(spanned_shape):
            raise ValueError(
                f"axis '{axisname}' has shape {axis_shape} but the spanned data dimensions have shape {spanned_shape}"
            )

    # Print NeXus structure.

    print(f"{nxdata.name}:NXdata")
    for name, value in nxdata_attrs.items():
        print(
            f"  @{name}={value.tolist() if isinstance(value, numpy.ndarray) else value}"
        )
    for name, value in nxdata.items():
        print(f"  {name}: {value.dtype}{list(value.shape)}")

    # Print axes <-> dimensions mapping.

    print()
    for axisname, dimensions in axisname_to_dimensions.items():
        print(f"Axis '{axisname}' spans dimensions {dimensions}")

    print()
    for dimension, axes_names in dimensions_to_axisnames.items():
        print(f"Dimension '{dimension}' is spanned by axes {axes_names}")


def _possible_shapes(shape: tuple[int]) -> list[tuple[int]]:
    """Every dimension can have N+1 coordinates where N is the number of data points along any dimension."""
    shapes = [shape]
    base_shape = numpy.array(shape)

    rank = len(base_shape)
    indices = list(range(rank))
    for nhistogram in range(1, rank + 1):
        for idx_boundaries in combinations(indices, nhistogram):
            shape = base_shape.copy()
            shape[list(idx_boundaries)] += 1
            shapes.append(tuple(shape.tolist()))

    return shapes


def generate_nxdata() -> tuple[str, str]:
    hdf5_filename = "test.h5"
    with h5py.File(hdf5_filename, "w") as nxroot:
        nxroot.attrs["NX_class"] = "NXroot"
        nxentry = nxroot.create_group("scan1")
        nxentry.attrs["NX_class"] = "NXentry"
        nxdata = nxroot.create_group("data")
        nxdata.attrs["NX_class"] = "NXdata"

        nxdata.attrs["signal"] = "data"
        nxdata["data"] = numpy.zeros((10, 10, 20, 30, 30))

        nxdata["x"] = numpy.zeros((10, 10))
        nxdata["y"] = numpy.zeros((30,))  # cannot be used for more than 1 dimension
        nxdata["y1"] = h5py.SoftLink("y")
        nxdata["y2"] = h5py.SoftLink("y")

        nxdata.attrs["axes"] = ["x", ".", ".", "y1", "y2"]  # same as [0]
        nxdata.attrs["x_indices"] = [0, 1]  # superset of [0]
        # nxdata.attrs["y1_indices"] = [3]
        nxdata.attrs["y2_indices"] = [4]

        return hdf5_filename, nxdata.name


if __name__ == "__main__":
    hdf5_filename, data_path = generate_nxdata()
    with h5py.File(hdf5_filename, "r") as nxroot:
        nxdata = nxroot[data_path]
        nxdata_coordinates(nxdata)

/data:NXdata
  @NX_class=NXdata
  @axes=['x', '.', '.', 'y1', 'y2']
  @signal=data
  @x_indices=[0, 1]
  @y2_indices=[4]
  data: float64[10, 10, 20, 30, 30]
  x: float64[10, 10]
  y: float64[30]
  y1: float64[30]
  y2: float64[30]

Axis 'y2' spans dimensions [4]
Axis 'y1' spans dimensions [3]
Axis 'x' spans dimensions [0, 1]

Dimension '0' is spanned by axes ['x*']
Dimension '1' is spanned by axes ['x']
Dimension '2' is spanned by axes []
Dimension '3' is spanned by axes ['y1*']
Dimension '4' is spanned by axes ['y2*']

@jacobfilik Since you detected the original issue, could you check whether the code describes the specs as you understand them?

phyy-nx · 2024-09-28T09:54:56Z

👍 from the NIAC as to adding a gallery to the NXdata page of examples

phyy-nx · 2024-09-28T10:04:35Z

This PR can be divided into two sets of fixes. The first resolves the problem in #1381, and can be summarized as follows:

The @axes attribute should give the default axis to be used for every dimension in the given order as it is in the signal ("." is allowed).

The second aspect of this PR is to greatly improve the documentation.

No votes are needed for either, as the first is a correction to something that shouldn't have been added in the first place, and the second we're just happy to have :)

woutdenolf added 2 commits June 30, 2024 14:50

sometimes referred to as -> also referred to as

b1ad2e6

NXdata@axes rectification: contains the default axes, not all axes

26315aa

woutdenolf linked an issue Jun 30, 2024 that may be closed by this pull request

Change to the axes attribute meaning #1381

Open

woutdenolf requested a review from a team June 30, 2024 14:38

This was referenced Jun 30, 2024

Retighten specification of NXdata's axes attribute for #1381 #1392

Closed

Change to the axes attribute meaning #1381

Open

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

woutdenolf added 13 commits July 1, 2024 20:48

clarify what "equal shape" means

d19ff47

rephrase the fact that a list of axis names is not provided by NXdata…

7844bda

… directly

clarify that one dimension is covered by more than on axis

d85f56c

capitals DATA in analogy to AXISNAME

61343dd

switch order of examples for clarity

939fb7e

use "spanned" instead of "covered"

ecd53dc

each dimension can only have one default axis

69e53c9

remove justification for the complex axes definition

9a78a11

reword

1c61a09

reword

6a014b5

change quotes

115e631

change quotes

c8e5d6c

reword

27cfde3

PeterC-DLS reviewed Jul 17, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 17, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Jul 17, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

woutdenolf force-pushed the 1381-change-to-the-axes-attribute-meaning branch 7 times, most recently from edba426 to a518e52 Compare August 16, 2024 03:37

NIAC suggestions: start with simple example at the top, histogram axi…

e37a676

…s +1, indices can be omitted, indices cannot contradict @axes

woutdenolf force-pushed the 1381-change-to-the-axes-attribute-meaning branch from a518e52 to e37a676 Compare August 16, 2024 03:47

phyy-nx added a commit that referenced this pull request Aug 19, 2024

Code review

9d4c4d2

- NX_FLOAT -> NX_NUMBER for scaling_factor and offset - Sync language with number #1396 for the FIELDNAME convention

PeterC-DLS reviewed Aug 20, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Aug 20, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

PeterC-DLS reviewed Aug 20, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Outdated Show resolved Hide resolved

woutdenolf added 2 commits August 20, 2024 13:48

review

0bf8672

#edges = #data_points + 1

6b23f02

PeterC-DLS reviewed Aug 20, 2024

View reviewed changes

base_classes/NXdata.nxdl.xml Show resolved Hide resolved

woutdenolf force-pushed the 1381-change-to-the-axes-attribute-meaning branch 2 times, most recently from 04b2dd0 to a919e99 Compare September 29, 2024 15:30

WIP

c27e2d3

woutdenolf force-pushed the 1381-change-to-the-axes-attribute-meaning branch from a919e99 to c27e2d3 Compare September 29, 2024 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: 1381 change to the axes attribute meaning #1396

Fix: 1381 change to the axes attribute meaning #1396

woutdenolf commented Jun 30, 2024 •

edited

Loading

PeterC-DLS commented Jul 17, 2024

PeterC-DLS Aug 20, 2024

woutdenolf Aug 20, 2024

woutdenolf Aug 20, 2024

prjemian Aug 20, 2024

PeterC-DLS Aug 20, 2024

rayosborn Aug 20, 2024

prjemian Aug 20, 2024

PeterC-DLS Aug 20, 2024

prjemian Aug 20, 2024 •

edited

Loading

PeterC-DLS Aug 20, 2024

woutdenolf Aug 20, 2024 •

edited

Loading

PeterC-DLS Aug 20, 2024

woutdenolf Aug 20, 2024

woutdenolf Aug 20, 2024

prjemian Aug 20, 2024

woutdenolf Aug 20, 2024

woutdenolf Aug 20, 2024

woutdenolf Sep 27, 2024 •

edited

Loading

prjemian Sep 28, 2024

rayosborn commented Aug 20, 2024 •

edited

Loading

rayosborn commented Aug 20, 2024

rayosborn commented Aug 20, 2024 •

edited

Loading

woutdenolf commented Aug 20, 2024 •

edited

Loading

woutdenolf commented Sep 27, 2024 •

edited

Loading

woutdenolf commented Sep 27, 2024 •

edited

Loading

phyy-nx commented Sep 28, 2024

phyy-nx commented Sep 28, 2024 •

edited

Loading

Fix: 1381 change to the axes attribute meaning #1396

Are you sure you want to change the base?

Fix: 1381 change to the axes attribute meaning #1396

Conversation

woutdenolf commented Jun 30, 2024 • edited Loading

Context

Purpose of this PR

Details on @axes rectification

PeterC-DLS commented Jul 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prjemian Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

woutdenolf Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

woutdenolf Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rayosborn commented Aug 20, 2024 • edited Loading

rayosborn commented Aug 20, 2024

rayosborn commented Aug 20, 2024 • edited Loading

woutdenolf commented Aug 20, 2024 • edited Loading

woutdenolf commented Sep 27, 2024 • edited Loading

woutdenolf commented Sep 27, 2024 • edited Loading

phyy-nx commented Sep 28, 2024

phyy-nx commented Sep 28, 2024 • edited Loading

woutdenolf commented Jun 30, 2024 •

edited

Loading

prjemian Aug 20, 2024 •

edited

Loading

woutdenolf Aug 20, 2024 •

edited

Loading

woutdenolf Sep 27, 2024 •

edited

Loading

rayosborn commented Aug 20, 2024 •

edited

Loading

rayosborn commented Aug 20, 2024 •

edited

Loading

woutdenolf commented Aug 20, 2024 •

edited

Loading

woutdenolf commented Sep 27, 2024 •

edited

Loading

woutdenolf commented Sep 27, 2024 •

edited

Loading

phyy-nx commented Sep 28, 2024 •

edited

Loading