-
Hi everyone, I'm trying to gain a more detailed understanding of how to balance the tug-of-war between performance and storage requirements with Zarr. Better compression of course leads to smaller chunks, which reduces the storage requirements. To balance this with high-performance data access, the way I see it you would need to find an appropriate (i.e. based on your specific application's bottlenecks) compression level:
To better understand how these two could be balanced, I was wondering whether the I intend to run a small experiment to test this, but if someone already knows the answer that would be great! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
The So yes, the data are decompressed every time you read a chunk, even if that chunk is cached. There are definitely scenarios where caching decompressed data would be preferable. That would be an interesting direction to explore. |
Beta Was this translation helpful? Give feedback.
-
If we really care about performance, we should think about caching decompressed chunks in zarr v3, as well as caching compressed chunks. |
Beta Was this translation helpful? Give feedback.
The
Store
interface in Zarr Python returns compressed bytes from storage.LRUStoreCache
wraps one (high latency) store with another store which caches those bytes in memory, exposing the same interface to the rest of the library.So yes, the data are decompressed every time you read a chunk, even if that chunk is cached.
There are definitely scenarios where caching decompressed data would be preferable. That would be an interesting direction to explore.