Skip to content

HDF5 Plugin Working Group

Dana Robinson edited this page Oct 10, 2024 · 17 revisions

The HDF5 Plugin Working Group meets monthly, normally on the second Thursday of the month at 10 am Central US time (the normal time for the HDF5 Working Group meeting).

This meeting is for the sustainability and governance of all HDF5 plugins - filter/compression, virtual file driver (VFD), and virtual object layer (VOL) connector. Anyone who is interested in this topic or wants to contribute their time and energy is welcome to attend. This meeting is NOT intended for providing technical support. Please use the forum for that.

Zoom link: https://us06web.zoom.us/j/89601195963

The agenda and any cancellations will be posted below and on the forum, usually on Monday. If you have any action items to discuss, please email derobins at hdfgroup dot org to get them added to the agenda. If there are no pressing issues by Monday, the meeting may be cancelled.

14 November 2024

10 October 2024

Agenda

  • Go over existing documentation
  • Create a plan for the 1.16.0 release
    • Are there any special considerations given this will be a major release?
  • Talk about a possible Bit Grooming filter (Here? In the HDF5 library? Is this in NetCDF?)

Notes

Go over existing documentation

  • Should add a table for the maintained filters in the main RELEASE.md file
  • Need to be clear about separation between HDF5 filter code and compression libraries
    • We do save compression library code for some filters
    • Need a policy for this
  • Should probably add a column for 'platforms supported'
  • We need to be clear about what 'not supported' means
  • Should mention the hdf5plugin Python repo and make it clear about the differences (talk to maintainers)
  • Elena says we should look at the binwalk utility to see if it can detect compressed chunks in HDF5 files
  • Should talk to conda, etc. maintainers about their strategy for deploying plugins
  • We should reconsider building more compression algorithms into the HDF5 library
    • Would be handy for users
    • More people would be willing to use the fancier filters
    • Having so many library dependencies would be a pain
  • Discussion from Peter Lindstrom's questions:
    • We should add a filter name lookup table to h5dump, etc. so it emits more useful compression info
    • We should evaluate adding filter names to the filter object header message
    • The filter parameters should be more easily interpreted by the plugins /tools /users
    • An array of 32-bit unsigned ints is really limiting for the filter parameters
    • We should re-evaluate the filter interface for HDF5 2.0

Create a plan for the 1.16.0 release

  • Nothing special necessary for 1.16
  • Will need to come up with a versioning scheme for 2.0.0

Talk about a possible Bit Grooming filter

  • NetCDF guys say this is in NetCDF, not implemented as a filter since the output are valid IEEE floating-point numbers
  • General agreement that this would be a useful thing to add to the library
  • Should be implemented as a filter with a no-op decompress callback for provenance purposes
  • People usually do bit grooming, then shuffle, then compression
  • More work to add testing than the actual feature
  • Peter Lindstrom suggested this implementation: https://gmd.copernicus.org/articles/14/377/2021/

Bonus discussion - HDF4 NetCDF API removal

  • Announced the removal of the NetCDF API
  • Elena says that problems tend to be found via the NetCDF test program and not the main HDF4 tests

19 September 2024

Agenda

  • Are signed plugins working for the 1.14.5 release?

Notes

  • Plugin signing
    • A-OK says Allen
  • BLOSC2 has issues with ARM Mac and won't be included there
    • Allen will hassle Francesc

Notes

15 August 2024

Agenda

  • Who needs to be a CODEOWNER?
  • Create a to-do list for CI
    • Why is the existing CI failing and what is the timeline for resolution?
    • Do we do CI here or in the HDF5 repo? Both?
    • Which branches? develop? 1.14? 1.14.x releases?
    • Is the CI sufficient or should it be expanded?
  • Look over existing issues
  • Are all the non-supported filters in the filter list community filters?
    • We should create a directory and README for them
  • What are critical issues for the 1.14.5 release in September?
    • Roughly one month of development time to go, so we need to get any issues sorted ASAP

Notes

  • Don't move the supported filters to a common directory (at least not yet), as this will complicate the CMake code
  • We need to investigate the LZ4 filter to see if it adds extra bytes (issue #134)
  • Allen says the CI will be fixed today
  • CI does two things:
    • Check if it can open a dataset in a test file
    • Do a repack and make sure it can open it
  • This is pretty minimal and probably not enough
    • Do chunk extraction
    • Test a wider variety of data
    • Test group names?
  • Which branches?
    • HDF5 runs these tests as a part of its CI, so tests added here will run in that branch
    • develop and latest for-release branch (currently 1.14)
  • Add Elena and Quincey to CODEOWNERS, also HDFG devs as per HDF5
  • Any other outstanding issues?
    • No?

To-Do

Last week

  • Create basic documentation about purpose of repo, etc.
  • Separate existing filters into "supported" and "community" (community - DONE, supported - pending working CI)
  • Make a list of existing filters that should be added to the "community" section
  • Maintain filter ID list via this repo (list is already in docs, should probably be moved to the root)

This week

  • Need to document the filter parameters each filter uses
  • Create CI badges (Windows, MacOS, Linux, develop/1.14, etc.)

18 July 2024

Agenda

  • Governance (where put docs? - wiki?)
  • Who wants to contribute?
    • Allen, me
  • Repo reorganization
  • Supported platforms and compilers
  • CI - Do we have enough?
  • How do we manage compression libraries? (more challenging than managing filter code)
  • Do we need to add any new plugins for the 1.14.5 release?

Overall Goals

  • One-stop shopping for official filters
  • Maintain high quality filters
    • Alignment with h5py, netCDF, BLOSC, Conda, Debian, other installers
      • How do we deal with Matlab, IDL, older versions, etc.?
    • Uniform install locations, or at least best practices
    • How handle versioning?
  • Clearly indicate which filters are "highly supported"
  • Long-term HDF5 file maintenance (storing filter provenance)
  • Improve documentation

Random notes

  • Many versions of filters in the wild, which is a problem (Elena)
  • Need to support more filters (Elena)
  • Two parts to a filter - filter itself and compression library (Allen)
    • This repo is only about filters, not libraries
  • Need guidance for what constitutes a "proper filter"
    • Code quality, examples, etc.
    • Could go over this as a part of assigning a filter ID number
    • Though this shouldn't be mandatory (some filters may be private)
  • Need to move the filter ID list here
    • Turn it into a matrix w/ person responsible, supported compression library version(s), etc.
  • Cross-platform builds are difficult for some filters
    • Is this okay?
    • Need to document supported compilers and platforms
    • Do we support the Autotools?
      • I hope not (Dana)
  • Two levels of filters
    • Special filters that we carefully maintain and care about (h5py filters, etc.)
    • Community filters that are best-effort
  • What about external software like hsds? How do we help them deal with compressed HDF5 data?
  • Which filters are maintained/community?
    • Community: MAFISC, SZ
    • Maintained: Everything else

To-Do

  • Create basic documentation about purpose of repo, etc.
  • Separate existing filters into "highly maintained" and "community"
  • Make a list of existing filters that should be added to the "community" section
  • Maintain filter ID list via this repo