Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to open dataset that was created with Blosc(2) compression #1757

Open
Leonard-Mueller opened this issue Feb 11, 2025 · 1 comment
Open
Labels
bug Something isn't working

Comments

@Leonard-Mueller
Copy link

Leonard-Mueller commented Feb 11, 2025

Hi, I created a dataset with the following python code

    compression_options = hdf5plugin.Blosc2(
        cname='blosclz',  # Blosc2 supports 'zstd', 'lz4', 'blosclz', etc.
        clevel=9,  # Compression level (1-9)
        filters=hdf5plugin.Blosc2.SHUFFLE,  # Better for floating point data
    )

    if isinstance(da_arr.data, da.Array):
        chunks = da_arr.data.chunksize
        is_dask = True
    else:
        chunks = None
        is_dask = False
    dset = f.create_dataset(
        dset_name,
        shape=da_arr.shape,  # Keep original shape
        dtype=da_arr.dtype,  # Ensure proper data type
        chunks=chunks,  # copy chunksize of underlying dask array
        compression = compression_options
    )

The h5web (in vscode) viewer gives this error:

HDF5-DIAG: Error detected in HDF5 (1.14.2) thread 0:
  #000: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 1061 in H5Dread(): can't synchronously read data
    major: Dataset
    minor: Read failed
  #001: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 1008 in H5D__read_api_common(): can't read data
    major: Dataset
    minor: Read failed
  #002: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 2092 in H5VL_dataset_read_direct(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 2048 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #004: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLnative_dataset.c line 363 in H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #005: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dio.c line 279 in H5D__read(): can't initialize I/O info
    major: Dataset
    minor: Unable to initialize object
  #006: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dchunk.c line 1088 in H5D__chunk_io_init(): unable to create file and memory chunk selections
    major: Dataset
    minor: Unable to initialize object
  #007: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dchunk.c line 1231 in H5D__chunk_io_init_selections(): unable to create file chunk selections
    major: Dataset
    minor: Unable to initialize object
  #008: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dchunk.c line 1692 in H5D__create_piece_file_map_all(): can't insert chunk into skip list
    major: Dataspace
    minor: Unable to insert object
  #009: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5SL.c line 1036 in H5SL_insert(): can't create new skip list node
    major: Skip Lists
    minor: Unable to insert object
  #010: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5SL.c line 709 in H5SL__insert_common(): can't insert duplicate key
    major: Skip Lists
    minor: Unable to insert object

This is the metadata displayed with h5web of the dataset:

  "name": "slope_map",
  "path": "/slope_map",
  "attributes": [
    {
      "name": "DIMENSION_LIST",
      "shape": [
        2
      ],
      "type": {
        "class": "Array (variable length)",
        "base": {
          "class": "Reference"
        }
      }
    }
  ],
  "kind": "dataset",
  "shape": [
    12000,
    12001
  ],
  "type": {
    "class": "Float",
    "endianness": "little-endian",
    "size": 32
  },
  "chunks": [
    1000,
    1000
  ],
  "filters": [
    {
      "id": 32026,
      "name": "blosc2",
      "cd_values": [
        1,
        0,
        4,
        4000000,
        9,
        1,
        0,
        2,
        1000,
        1000
      ]
    }
  ],
  "rawType": {
    "signed": false,
    "type": 1,
    "vlen": false,
    "littleEndian": true,
    "size": 4,
    "total_size": 144012000
  }
}

With for instance gzip compression it works just fine. I am also read the dataset again with the h5py library. Should h5web be able to read blosc and blosc2 compressed datasets out of the box?

@Leonard-Mueller Leonard-Mueller added the bug Something isn't working label Feb 11, 2025
@axelboc
Copy link
Contributor

axelboc commented Feb 12, 2025

Hi @Leonard-Mueller, the VSCode extension supports Blosc2 compression out of the box:

import h5py
import numpy
import hdf5plugin

with h5py.File('blosc2.h5', 'w') as file:
  file.create_dataset(
    'blosc2',
    data=numpy.arange(100),
    compression=hdf5plugin.Blosc2(
      cname='blosclz',
      clevel=9,
      filters=hdf5plugin.Blosc2.SHUFFLE
    )
  )

blosc2.zip

Image

So I don't think the problem comes from the compression per se. Perhaps your dataset is somehow malformed and the Blosc2 filter is less lenient to it than the gzip filter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants