Switch to `zstd` compression #49

nicoburns · 2025-02-18T01:22:54Z

zstd is a newer compression algorithm which achieves similar compression ratios as xz (slightly worse, but within 5-10%) and decompresses significantly faster. As xz decompression is currently the bottleneck when recomputing scores for old runs, we should consider switching to zstd (subject to testing that shows it actually results in an improvement).

We'd likely want to use https://github.com/gyscos/zstd-rs for this.

The text was updated successfully, but these errors were encountered:

nicoburns · 2025-02-18T01:50:11Z

Ok, some initial results from this. zstd is faster to decompress (taking around ~60% of the time compared to xz). However the bigger win was decompressing to a String and then JSON decoding rather than JSON decoding directly from the decompression stream. Some representative numbers (variance across runs was small):

	xz	zstd
from_reader	1600ms	631ms
from_str	207ms	136ms

This is promising and likely means we can get a significant speedup from switching to a Rust implementation of scoring.

nicoburns · 2025-02-18T09:01:02Z

Some notes:

Compression level -22 (enabled with the --ultra flag) puts us very close to xz in terms of compression ratio (the largest files are ~3.3mb rather than ~3.1mb).
Compression level 22 is quite slow to compress (~40s on my machine) however they are still fast to decompress, and as we are compressing one file per day but decompressing all of them that's probably a good trade off?
I tried using dictionary compression but it didn't help much. It supposedly mostly helps on small files, so that makes sense. I feel like it ought to work on larger files with a larger dictionary, but the tools don't seem to support that use case.
The compression ratios we're getting here are pretty great. The files that are around 3MB compressed are around 100MB uncompressed!
The Rust scoring rewrite probably makes us fast enough without switching compression algorithms. However, it might still be nice to switch.
I think it might be nice to have a repo that only contains the runs. And move all the other stuff (scoring logic and website) elsewhere. If we make that change it might be a good chance to start a new repo and switch the compression algorithm. I would suggest:
- The scoring logic could move to a new rust based CLI tool (like the one at https://github.com/nicoburns/wptreport), which would allow it to also be run locally
- The website could be merged into the main Servo website.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to `zstd` compression #49

Switch to `zstd` compression #49

nicoburns commented Feb 18, 2025 •

edited

Loading

nicoburns commented Feb 18, 2025

nicoburns commented Feb 18, 2025

Switch to zstd compression #49

Switch to zstd compression #49

Comments

nicoburns commented Feb 18, 2025 • edited Loading

nicoburns commented Feb 18, 2025

nicoburns commented Feb 18, 2025

Switch to `zstd` compression #49

Switch to `zstd` compression #49

nicoburns commented Feb 18, 2025 •

edited

Loading