Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds to_bytes() to AsyncFsspecStore _cat_file method #107

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

norlandrhagen
Copy link
Contributor

Hi there!

I was wondering if you could use obstore to read a file into Xarray.

When passing the AsyncFsspecStore into Xarray, I was getting complaints from h5netcdf about it trying to read the first few bytes to determine if was a valid NetCDF and instead getting a Buffer.

This PR just calls the read_bytes() method on the arro3 Buffer in the _cat_file in AsyncFsspecStore.

AttributeError: 'arro3.core._core.Buffer' object has no attribute 'startswith'

    417 def open_dataset(
    418     self,
    419     filename_or_obj: str | os.PathLike[Any] | ReadBuffer | AbstractDataStore,
   (...)
    436     storage_options: dict[str, Any] | None = None,
    437 ) -> Dataset:
    438     filename_or_obj = _normalize_path(filename_or_obj)
--> 439     store = H5NetCDFStore.open(
    440         filename_or_obj,
    441         format=format,
    442         group=group,
    443         lock=lock,
    444         invalid_netcdf=invalid_netcdf,
    445         phony_dims=phony_dims,
    446         decode_vlen_strings=decode_vlen_strings,
    447         driver=driver,
    448         driver_kwds=driver_kwds,
    449         storage_options=storage_options,
    450     )
    452     store_entrypoint = StoreBackendEntrypoint()
    454     ds = store_entrypoint.open_dataset(
    455         store,
    456         mask_and_scale=mask_and_scale,
   (...)
    462         decode_timedelta=decode_timedelta,
    463     )

File ~/miniforge3/envs/obstore/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:170, in H5NetCDFStore.open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings, driver, driver_kwds, storage_options)
    168 elif isinstance(filename, io.IOBase):
    169     magic_number = read_magic_number_from_file(filename)
--> 170     if not magic_number.startswith(b"\211HDF\r\n\032\n"):
    171         raise ValueError(
    172             f"{magic_number!r} is not the signature of a valid netCDF4 file"
    173         )
    175 if format not in [None, "NETCDF4"]:

AttributeError: 'arro3.core._core.Buffer' object has no attribute 'startswith'

With some help from @martindurant, this now works:

from obstore.fsspec import AsyncFsspecStore
from obstore.store import LocalStore 


store = LocalStore(prefix=Path("."))
fss = AsyncFsspecStore(store)
ds = xr.open_dataset(fss.open('air.nc'), engine='h5netcdf', chunks={})
ds

* it also seemed to work with a S3 store.

@kylebarron
Copy link
Member

I figure fsspec consumers generally expect any returned data to be Python bytes, so this is fine to do in the fsspec adapter.

@kylebarron kylebarron merged commit 39819c8 into developmentseed:main Dec 11, 2024
4 checks passed
@norlandrhagen norlandrhagen deleted the fsspec_to_bytes branch December 11, 2024 23:44
@kylebarron
Copy link
Member

This is included in 0.3.0-beta.9, and I'd like to get an 0.3 release out soon

@norlandrhagen
Copy link
Contributor Author

Thanks @kylebarron!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants