-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopting Zstandard/"zstd" by default #17720
Comments
To reiterate the conclusion I've just posted on moby/moby#48328 (comment) here (with the DOI slant), I think those Debian Stable and Oldstable (Bookworm and Bullseye) versions still being stuck on 20.10 (and thus having no zstd support) means it's going to be several more years yet before DOI can reasonably consider switching to zstd -- at the very least, we'd need the release of Debian Trixie (which has a new enough version), but ideally a bit further into the Trixie support window before we consider dropping support for Bookworm users (maybe even all the way to that 2028 date, but at the very least likely the 2026 date of Bullseye's EOL 😭). Perhaps this is useful context for why I think uncompressed canonical layers is a better solution for the industry overall -- then registries and runtimes can decide which compression they want to use for storage (making whatever appropriate speed/space/CPU tradeoffs for their own situation), and can coordinate at the transport level which compression to use over-the-wire based on what they're willing to support. Current users would all be mostly unaffected because the "failure mode" is simply that layers are bigger than they used to be, but everything still works fine otherwise (so now upgrading is a very compelling carrot vs a stick/broken workflows). That would also resolve some of the interesting struggles around layer IDs vs "diff IDs" that the image specification has today. Doing "random access" from the registry API would then be trivial range requests across the tar (needing only a map of tar entries to offsets vs complex compression coordination). 🤷 (To put all of this another way, I don't think zstd solves enough "interesting" problems to be worth breaking so many users at the present time. ❤️) |
Yes, I agree overall. When I made ostree it was very intentionally inspired by git, which does not checksum compressed content and so neither have this problem. (tangential) One thing related to this is I've been having a vague thought that we could try to enourage signing tools (which commonly sign the manifest today) to also have a secondary signature that covers only the uncompressed content that can also be carried around (I think it'd just be the manifest without the compressed checksum layer IDs?...gets into issues around canonical json) etc. if we need to substitute values reproducibly). |
I'm filing this as a tracking issue for switching to Zstandard/"zstd" compression of layer blobs in Official Images so we can have a centralized place to point folks and to collect/dump relevant information.
To be very explicit, I am personally of the opinion that we would gain far more as an ecosystem by switching to canonically uncompressed blobs instead (and thus relying on at-rest and client-/server-negotiated transport compression, as suggested in opencontainers/distribution-spec#235 and linked issues/PRs such as containerd/containerd#8166 and distribution/distribution#3900), as this solves more interesting problems than simply swapping out gzip for zstd, but I do not think it's appropriate to block zstd adoption here entirely based on that opinion (especially since that idea gaining traction would just as easily mean dropping zstd as it does dropping gzip).
In order to make the determination as to whether we're past the "adoption curve" sufficiently to feel good about changing our defaults (ie, to shrink the list of stragglers who will come and complain about their outdated software no longer working with our images), I am collecting here a list of popular runtimes/tools, and the version/date that they added zstd support.
Additionally, I am compiling a companion table of distribution releases and their relevant EOL dates in cases where they have runtimes/tools that are not recent enough to have zstd support.
(The more I compile this table, the less happy with it I am 😭 I can't figure out a clean way to organize it that still includes all the data I need it to, but isn't so heinously verbose or superwide.)
See also:
The text was updated successfully, but these errors were encountered: