Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate zlib-cloudflare for 15% performance speedup of WarcRecordWriter #22

Open
tfmorris opened this issue Jul 14, 2023 · 4 comments
Open

Comments

@tfmorris
Copy link

tfmorris commented Jul 14, 2023

According to this 2019 analysis, fully 1/3 of WarcRecordWriter's time is being spent in zlib.so. Cloudflare has a performance enhanced drop-in compatible version of zlib, zlib-cloudflare, which is claimed to be almost twice as fast at gzip compression.

This could provide a significant speedup (~15% overall) for minimal implementation cost. There is documentation available which describes how to set it up. Ignore the fact that it's a Graviton page. It applies to all architectures.

While switching to a different algorithm is also possible, that would be much more disruptive to the ecosystem as compared to a drop-in replacement implementing the same algorithm.

@chris-ha458
Copy link

I feel that if compression tools are up for consideration, zstd should also be a contender.

The following is from an excerpt from zstd homepage. A more comprehensive comparison of compressors can be found in
lzbench, which unfortunately does not include gz.
In general, according to default settings zstd shows pareto optimal performance over gz and zlib over compression speed, decompression speed and compression ratio.

It also provides a Fast option shown below provides even faster compression and decompression, and the ultra option that provides even more compression.
Besides that there are adapt option that contains and adapts compression level according to runtime I/O conditions.
It can utilize multiple cores as well as contain all I/O and compression related processing into a single core.

However, the way I understand compression is used within the project i do not see any other function other than the default necessary.

Compressor name Ratio Compression Decompress.
zstd 1.5.1 -1 2.887 530 MB/s 1700 MB/s
zlib 1.2.11 -1 2.743 95 MB/s 400 MB/s
zstd 1.5.1 --fast=1 2.437 600 MB/s 2150 MB/s
zstd 1.5.1 --fast=3 2.239 670 MB/s 2250 MB/s
zstd 1.5.1 --fast=4 2.148 710 MB/s 2300 MB/s

As for comparison between gzip and zstd, I think i can provide this article regarding Amazon's decision

What he meant was that AWS changed how it stores its own service data (mostly logs) in S3 - by switching (as a client of S3 themselves) from gzipping logs to ztsd logs**, we were able to reduce our S3 storage costs by 30%.**

(Emphasis mine)

@tfmorris
Copy link
Author

@chris-ha458 If you'd like to propose switching to a different algorithm, please open an issue for that where the cost-benefit tradeoff of the disruption to the ecosystem can be discussed.

This issue addresses a low implementation effort, 100% compatible performance improvement without any impact on the downstream consumers and I'd like to keep it focused on that.

@chris-ha458
Copy link

I have discovered this, and it seems that my comment is completely irrelevant under these circumstances (This PR regards drop in replacement, and there is already investigation into WARC-zstd)

@sebastian-nagel
Copy link

Hi @tfmorris, thanks for this pointer. Indeed, it would be an easy drop-in replacement. We'll give it a try for sure!

One more pointer about the optimizations on ARM:

Some Linux systems may already make use of crc32 in the default library. If the default zlib is already optimized, then using zlib-cloudflare may not have any impact on performance.

Seems that some major optimizations are already picked into default zlib packages, e.g., see the Debian zlib changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants