Processing Parameter and File Management #373

jadball · 2025-01-10T14:28:46Z

Following on from #318, this issue concerns two problems:

How do we manage the files that are created as part of 3DXRD processing?
How do we remember which segmentation/indexing/mapping parameters were used to process the data?

For file management, we think the best solution is to be as minimal as possible.
Currently, the dataset class keeps track of all the files that are created, with embedded absolute paths in the H5. This is bad for several reasons (portability mainly).
However, it's convenient because we don't need to define all the paths at the top of each analysis notebook.

For file management, we propose the following:

The notebook/ewoks task is always in the same folder as the processed data that are generated
The notebook/ewoks task look for relative file names
Default file names should ensure automation (e.g the segmentation task will by default create peaks.h5, and the indexing task will by default look for a peaks.h5 file in the same folder).

For keeping track of the processing parameters, we propose the following:

In each output H5 file, we store the following in groups:
- The data itself (peaks/UBIs/etc)
- The path to the input file that created the data
- The path to the notebook/ewoks task that created the data
- The parameters used to create the data
  - e.g segmentation options
  - indexing options
- for grains.h5, we should save the spot3d_id for each grain

Filename templates seem to need:

PROCESSED_DATA/{sample}/{sample}{dataset}{version}peaks_table.h5
PROCESSED_DATA/{version}/{sample}/{sample}{dataset}_peaks_table.h5

Proposed structure of grains file:

grains.h5
- phase_1
  - Data
    - UBIs
    - Translations
    - spot3d_id
  - Parent
    - relative peaks_3d path
  - relative notebook path/ewoks task path
  - Indexing parameters
    - hkl_tol etc.
- phase_2
  - Data
    - UBIs
    - Translations
    - spot3d_id
  - Parent
    - relative peaks_3d path
  - relative notebook path/ewoks task path
  - indexing parameters
    - hkl_tol etc.

jonwright · 2025-01-10T16:21:41Z

#333 is related to this - what goes in which file and in which format.

jadball · 2025-01-10T19:21:08Z

Should wait on #371

loichuder · 2025-01-14T16:02:00Z

Note related to NeXus file structure: a valid NeXus file should contain at least one NXentry group (https://manual.nexusformat.org/classes/base_classes/NXentry.html#nxentry)

Then, in Ewoks, we often add one child NXprocess per processing step where we store the processing results and configuration.

The resulting structure could then be something like this:

peaks.h5 (NXroot)
- entry (NXentry)
  - segmentation (NXprocess)
    - ...
  - indexing (NXprocess)
    - ...

This is just for information purpose: I am not saying you should follow this. Perhaps it doesn't even fit your usecase.

jadball · 2025-01-16T13:24:09Z

Could have one NXprocess entry per file, then a master file that links them all together

jonwright · 2025-02-05T18:04:54Z

Here are some links to existing codes I could find elsewhere - it looks impossible to be compatible with anyone else - but perhaps some inspiration...

For diffractometer geometry and an example using "Nexus" for crystallography, then Ray Osborn has a lot of good stuff here: https://github.com/nexpy/nxrefine. I just had a look at his code and found:

parameters like hkl_tolerance are put into an NXparameters
pixel_mask is put into instrument/detector/pixel_mask here
a multiprocessing resource tracker workaround(eg, alternative to the monkeypatching in refactor properties.py - 4d peak labelling #265 + make sandbox importable closing #255 #389 )
scattering angles go to a peaks group

Elsewhere in Dials they can dump to NXreflections:
https://manual.nexusformat.org/classes/base_classes/NXreflections.html
This would mean figuring out the mapping of "observed_px_x"->"fc" or "f_raw" and "observed_x" -> "yl", etc and then adding things we don't find in there or dealing with problems like image flips. Presumably, we can write whatever we like in our "own" group and add a "Nexusformat" group that links to items we have where definitions match.

For grain orientations then Nexus has: ub_matrix[n_comp, 3, 3] within NXsample. This might work for box beam maps if position centroids (and other things) can be added. Perhaps ask Maciej/Carsten whether our ub convention matches spec, bliss, binoculars, etc.

Further digging on file formats brings me to reading distortion files from XDS, presumably in CBF format, for MX and I found this which is importing pycbf that I started in 2005.

There is a data model for ISpyB but I don't think it helps us.

jonwright · 2025-03-05T18:45:18Z

Seen today: motor positions of calibration fit parameters (frelx, etc) need to match motors for data being processed.

Everything fails when the wrong calibration is used. Today, we don't save any instrument metadata with parameters... Something to fix in the future perhaps?

jadball mentioned this issue Jan 10, 2025

Saving processing parameters & Nexus output, etc #318

Closed

jadball mentioned this issue Jan 10, 2025

New release #371

Open

This was referenced Jan 17, 2025

Ewoks integration #358

Open

colfile_to_hdf: TypeError: Only chunked datasets can be resized #384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing Parameter and File Management #373

Processing Parameter and File Management #373

jadball commented Jan 10, 2025

jonwright commented Jan 10, 2025

jadball commented Jan 10, 2025

loichuder commented Jan 14, 2025

jadball commented Jan 16, 2025

jonwright commented Feb 5, 2025

jonwright commented Mar 5, 2025

Processing Parameter and File Management #373

Processing Parameter and File Management #373

Comments

jadball commented Jan 10, 2025

jonwright commented Jan 10, 2025

jadball commented Jan 10, 2025

loichuder commented Jan 14, 2025

jadball commented Jan 16, 2025

jonwright commented Feb 5, 2025

jonwright commented Mar 5, 2025