Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to zstd compression #49

Open
nicoburns opened this issue Feb 18, 2025 · 2 comments
Open

Switch to zstd compression #49

nicoburns opened this issue Feb 18, 2025 · 2 comments

Comments

@nicoburns
Copy link
Contributor

nicoburns commented Feb 18, 2025

zstd is a newer compression algorithm which achieves similar compression ratios as xz (slightly worse, but within 5-10%) and decompresses significantly faster. As xz decompression is currently the bottleneck when recomputing scores for old runs, we should consider switching to zstd (subject to testing that shows it actually results in an improvement).

We'd likely want to use https://github.com/gyscos/zstd-rs for this.

@nicoburns
Copy link
Contributor Author

Ok, some initial results from this. zstd is faster to decompress (taking around ~60% of the time compared to xz). However the bigger win was decompressing to a String and then JSON decoding rather than JSON decoding directly from the decompression stream. Some representative numbers (variance across runs was small):

xz zstd
from_reader 1600ms 631ms
from_str 207ms 136ms

This is promising and likely means we can get a significant speedup from switching to a Rust implementation of scoring.

@nicoburns
Copy link
Contributor Author

Some notes:

  • Compression level -22 (enabled with the --ultra flag) puts us very close to xz in terms of compression ratio (the largest files are ~3.3mb rather than ~3.1mb).
  • Compression level 22 is quite slow to compress (~40s on my machine) however they are still fast to decompress, and as we are compressing one file per day but decompressing all of them that's probably a good trade off?
  • I tried using dictionary compression but it didn't help much. It supposedly mostly helps on small files, so that makes sense. I feel like it ought to work on larger files with a larger dictionary, but the tools don't seem to support that use case.
  • The compression ratios we're getting here are pretty great. The files that are around 3MB compressed are around 100MB uncompressed!
  • The Rust scoring rewrite probably makes us fast enough without switching compression algorithms. However, it might still be nice to switch.
  • I think it might be nice to have a repo that only contains the runs. And move all the other stuff (scoring logic and website) elsewhere. If we make that change it might be a good chance to start a new repo and switch the compression algorithm. I would suggest:
    • The scoring logic could move to a new rust based CLI tool (like the one at https://github.com/nicoburns/wptreport), which would allow it to also be run locally
    • The website could be merged into the main Servo website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant