You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a large ndarray is stored as binary block with compression, then the (beginning of) the whole block needs to be read and decompressed even when only a small subarray is read. "Chunking" remedies this; instead of storing an ndarray as a single binary block, it is stored as a set of smaller blocks that are compressed and stored independently.
Are there plans to support this? Can this be implemented as extension?
One simple approach would be to introduce a new yaml tag core/chunked-ndarray that consists of a yaml map that maps offsets to ndarrays, for example
chunky: !core/chunked-ndarray-1.0.0
- !core/ndarray-chunk-1.0.0offset: [0,0]data: !core/ndarray-1.0.0source: ... # the usual ndarray stuff here
- !core/ndarray-chunk-1.0.0offset: [100,0]data: !core/ndarray-1.0.0source: ... # the usual ndarray stuff here
- !core/ndarray-chunk-1.0.0offset: [0,100]data: !core/ndarray-1.0.0source: ... # the usual ndarray stuff here# possibly more chunks here
Has there been any work in this direction?
The text was updated successfully, but these errors were encountered:
There has been some work adding support for the zarr storage format within ASDF. This is implemented via an extension: https://github.com/asdf-format/asdf-zarr It's a new package so please let me know if it's something you plan to use "in production" (so we can give it another review, also feel free to give it a try and open issues if you find anything). The extension offers a few options:
storing the zarr data inside ASDF blocks (with a chunk per block, I think most similar to what you described)
referencing external zarr storage (either DirectoryStore "flat files", S3 stores, or any of the many formats zarr supports).
The use of zarr also opens up a second place where compression can be controlled (which can get a bit confusing).
@braingram Nice! We are currently discussing storage formats, and both ASDF and Zarr are contenders that have various advantages and disadvantages. On the surface, using Zarr chunking with ASDF single-file storage seems like an excellent choice. I will have a look.
When a large ndarray is stored as binary block with compression, then the (beginning of) the whole block needs to be read and decompressed even when only a small subarray is read. "Chunking" remedies this; instead of storing an ndarray as a single binary block, it is stored as a set of smaller blocks that are compressed and stored independently.
Are there plans to support this? Can this be implemented as extension?
One simple approach would be to introduce a new yaml tag
core/chunked-ndarray
that consists of a yaml map that maps offsets to ndarrays, for exampleHas there been any work in this direction?
The text was updated successfully, but these errors were encountered: