Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing Parameter and File Management #373

Open
jadball opened this issue Jan 10, 2025 · 6 comments
Open

Processing Parameter and File Management #373

jadball opened this issue Jan 10, 2025 · 6 comments

Comments

@jadball
Copy link
Contributor

jadball commented Jan 10, 2025

Following on from #318, this issue concerns two problems:

  • How do we manage the files that are created as part of 3DXRD processing?
  • How do we remember which segmentation/indexing/mapping parameters were used to process the data?

For file management, we think the best solution is to be as minimal as possible.
Currently, the dataset class keeps track of all the files that are created, with embedded absolute paths in the H5. This is bad for several reasons (portability mainly).
However, it's convenient because we don't need to define all the paths at the top of each analysis notebook.

For file management, we propose the following:

  • The notebook/ewoks task is always in the same folder as the processed data that are generated
  • The notebook/ewoks task look for relative file names
  • Default file names should ensure automation (e.g the segmentation task will by default create peaks.h5, and the indexing task will by default look for a peaks.h5 file in the same folder).

For keeping track of the processing parameters, we propose the following:

  • In each output H5 file, we store the following in groups:
    • The data itself (peaks/UBIs/etc)
    • The path to the input file that created the data
    • The path to the notebook/ewoks task that created the data
    • The parameters used to create the data
      • e.g segmentation options
      • indexing options
    • for grains.h5, we should save the spot3d_id for each grain

Filename templates seem to need:

PROCESSED_DATA/{sample}/{sample}{dataset}{version}peaks_table.h5
PROCESSED_DATA/{version}/{sample}/{sample}
{dataset}_peaks_table.h5

Proposed structure of grains file:

  • grains.h5
    • phase_1
      • Data
        • UBIs
        • Translations
        • spot3d_id
      • Parent
        • relative peaks_3d path
      • relative notebook path/ewoks task path
      • Indexing parameters
        • hkl_tol etc.
    • phase_2
      • Data
        • UBIs
        • Translations
        • spot3d_id
      • Parent
        • relative peaks_3d path
      • relative notebook path/ewoks task path
      • indexing parameters
        • hkl_tol etc.
@jonwright
Copy link
Member

#333 is related to this - what goes in which file and in which format.

@jadball jadball mentioned this issue Jan 10, 2025
@jadball
Copy link
Contributor Author

jadball commented Jan 10, 2025

Should wait on #371

@loichuder
Copy link
Contributor

Note related to NeXus file structure: a valid NeXus file should contain at least one NXentry group (https://manual.nexusformat.org/classes/base_classes/NXentry.html#nxentry)

Then, in Ewoks, we often add one child NXprocess per processing step where we store the processing results and configuration.

The resulting structure could then be something like this:

  • peaks.h5 (NXroot)
    • entry (NXentry)
      • segmentation (NXprocess)
        • ...
      • indexing (NXprocess)
        • ...

This is just for information purpose: I am not saying you should follow this. Perhaps it doesn't even fit your usecase.

@jadball
Copy link
Contributor Author

jadball commented Jan 16, 2025

Could have one NXprocess entry per file, then a master file that links them all together

@jonwright
Copy link
Member

Here are some links to existing codes I could find elsewhere - it looks impossible to be compatible with anyone else - but perhaps some inspiration...

For diffractometer geometry and an example using "Nexus" for crystallography, then Ray Osborn has a lot of good stuff here: https://github.com/nexpy/nxrefine. I just had a look at his code and found:

Elsewhere in Dials they can dump to NXreflections:
https://manual.nexusformat.org/classes/base_classes/NXreflections.html
This would mean figuring out the mapping of "observed_px_x"->"fc" or "f_raw" and "observed_x" -> "yl", etc and then adding things we don't find in there or dealing with problems like image flips. Presumably, we can write whatever we like in our "own" group and add a "Nexusformat" group that links to items we have where definitions match.

For grain orientations then Nexus has: ub_matrix[n_comp, 3, 3] within NXsample. This might work for box beam maps if position centroids (and other things) can be added. Perhaps ask Maciej/Carsten whether our ub convention matches spec, bliss, binoculars, etc.

Further digging on file formats brings me to reading distortion files from XDS, presumably in CBF format, for MX and I found this which is importing pycbf that I started in 2005.

There is a data model for ISpyB but I don't think it helps us.

@jonwright
Copy link
Member

Seen today: motor positions of calibration fit parameters (frelx, etc) need to match motors for data being processed.

Everything fails when the wrong calibration is used. Today, we don't save any instrument metadata with parameters... Something to fix in the future perhaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants