-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_zarr failed in rust: incompatible fill value 0 for data type string #15
Comments
The zarr backend is still in beta. It has no guarantee to be compatible with zarr files generated from different programs. The zarr development is not on my priority list as one of the main issue with zarr is that it generates a lot of small files which degrades the hard drive performance. Because of this, zarr is less convenient to use compared with hdf5. |
I see. I can help if you can give me some hints or directions to work with. Zarr support is important since it is not dependent on the hdf5 C library, which makes the anndata-rs more portable on different platforms. |
@zqfang that error is raised by
@kaizhang that could be addressed with the |
@LDeakin , Thank you so much for your insights! you save my day! So the issue is: the fill value is 0 for string array in anndata-zarr v2, while zarrs did not allow this. After I changed the import zarr
import base64
# change 0 to 'MA=='
# see fill value encoding: https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html
# the fill value MUST be encoded as an ASCII string using the standard Base64 alphabet
print(base64.b64encode(b'0').decode()) # print out MA==
#
store = zarr.open("ZZM_panNK_test5.zarr", mode="rw")
# update value, this is just one of the string-array
store["obs/_index"].fill_value = "MA==" |
Just to chime in here, zarr v3 is still missing a number of features "needed" for anndata: https://zarr.readthedocs.io/en/stable/user-guide/v3_migration.html#work-in-progress But we can definitely start hacking, and maybe providing our own fill-in implementations although the first step was simply upgrading the python version of the package before the format. For example, I'm not sure how important things like structured arrays really are and even so, we might be able to write a python codec or the like to provide a bridge. I'm excited to try out sharding as well :) |
Just checked in any case to be sure, and most of the failures are structured array-related in zarr file-format version 3. So if this package does not rely on them (not sure how it would since I don't think |
Hi Kai,
Thank you so much for the amazing implementation of anndata in rust. It helps a lot with my research.
I need your help with reading the Zarr file.
I've been testing zarr format input, however, it complains about the error message below. I can't figure out what the error means. I hope you can help me out about this.
the code I used was
I used scanpy to save h5ad file to zarr. h5ad works great in rust, but zarr is not.
Please see the zarr file I used.
ZZM_panNK_test5.zarr.zip
The text was updated successfully, but these errors were encountered: