-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate zlib-cloudflare for 15% performance speedup of WarcRecordWriter #22
Comments
I feel that if compression tools are up for consideration, zstd should also be a contender. The following is from an excerpt from zstd homepage. A more comprehensive comparison of compressors can be found in It also provides a Fast option shown below provides even faster compression and decompression, and the ultra option that provides even more compression. However, the way I understand compression is used within the project i do not see any other function other than the default necessary.
As for comparison between gzip and zstd, I think i can provide this article regarding Amazon's decision
(Emphasis mine) |
@chris-ha458 If you'd like to propose switching to a different algorithm, please open an issue for that where the cost-benefit tradeoff of the disruption to the ecosystem can be discussed. This issue addresses a low implementation effort, 100% compatible performance improvement without any impact on the downstream consumers and I'd like to keep it focused on that. |
I have discovered this, and it seems that my comment is completely irrelevant under these circumstances (This PR regards drop in replacement, and there is already investigation into WARC-zstd) |
Hi @tfmorris, thanks for this pointer. Indeed, it would be an easy drop-in replacement. We'll give it a try for sure! One more pointer about the optimizations on ARM:
Seems that some major optimizations are already picked into default zlib packages, e.g., see the Debian zlib changelog. |
According to this 2019 analysis, fully 1/3 of WarcRecordWriter's time is being spent in zlib.so. Cloudflare has a performance enhanced drop-in compatible version of zlib, zlib-cloudflare, which is claimed to be almost twice as fast at gzip compression.
This could provide a significant speedup (~15% overall) for minimal implementation cost. There is documentation available which describes how to set it up. Ignore the fact that it's a Graviton page. It applies to all architectures.
While switching to a different algorithm is also possible, that would be much more disruptive to the ecosystem as compared to a drop-in replacement implementing the same algorithm.
The text was updated successfully, but these errors were encountered: