Does the `LRUStoreCache` store compressed or decompressed chunks? #1833

cwognum · 2024-05-03T15:44:45Z

cwognum
May 3, 2024

Hi everyone,

I'm trying to gain a more detailed understanding of how to balance the tug-of-war between performance and storage requirements with Zarr. Better compression of course leads to smaller chunks, which reduces the storage requirements. To balance this with high-performance data access, the way I see it you would need to find an appropriate (i.e. based on your specific application's bottlenecks) compression level:

Compression may speed up data access as smaller chunks are faster to move. Not only from a remote to a local location but also from the disk to memory or from the CPU to the GPU.
Compression / Decompression induces a performance penalty, however, which may ultimately slow down data access.

To better understand how these two could be balanced, I was wondering whether the LRUStoreCache stores compressed or decompressed chunks? In other words, if I access data that is in the cache, will I still need to decompress it?

I intend to run a small experiment to test this, but if someone already knows the answer that would be great!

Answered by rabernat

May 3, 2024

The Store interface in Zarr Python returns compressed bytes from storage. LRUStoreCache wraps one (high latency) store with another store which caches those bytes in memory, exposing the same interface to the rest of the library.

So yes, the data are decompressed every time you read a chunk, even if that chunk is cached.

There are definitely scenarios where caching decompressed data would be preferable. That would be an interesting direction to explore.

View full answer

rabernat · 2024-05-03T17:40:31Z

rabernat
May 3, 2024
Maintainer

The Store interface in Zarr Python returns compressed bytes from storage. LRUStoreCache wraps one (high latency) store with another store which caches those bytes in memory, exposing the same interface to the rest of the library.

So yes, the data are decompressed every time you read a chunk, even if that chunk is cached.

There are definitely scenarios where caching decompressed data would be preferable. That would be an interesting direction to explore.

1 reply

cwognum May 3, 2024
Author

Thank you, @rabernat !

I think caching compressed chunks in memory is interesting, because it allows you to store larger-than-memory datasets in memory, reducing overhead from disk-to-memory transfers. At the same time, you do get a performance penalty for having to decompress the data every time. I would be curious to run a small benchmarking experiment and to see how the two compare for typical use cases.

d-v-b · 2024-05-03T18:32:01Z

d-v-b
May 3, 2024
Maintainer

If we really care about performance, we should think about caching decompressed chunks in zarr v3, as well as caching compressed chunks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the `LRUStoreCache` store compressed or decompressed chunks? #1833

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Does the LRUStoreCache store compressed or decompressed chunks? #1833

cwognum May 3, 2024

Replies: 2 comments · 1 reply

rabernat May 3, 2024 Maintainer

cwognum May 3, 2024 Author

d-v-b May 3, 2024 Maintainer

Does the `LRUStoreCache` store compressed or decompressed chunks? #1833

cwognum
May 3, 2024

Replies: 2 comments 1 reply

rabernat
May 3, 2024
Maintainer

cwognum May 3, 2024
Author

d-v-b
May 3, 2024
Maintainer