Adopting Zstandard/"zstd" by default #17720

tianon · 2024-10-14T19:22:00Z

I'm filing this as a tracking issue for switching to Zstandard/"zstd" compression of layer blobs in Official Images so we can have a centralized place to point folks and to collect/dump relevant information.

To be very explicit, I am personally of the opinion that we would gain far more as an ecosystem by switching to canonically uncompressed blobs instead (and thus relying on at-rest and client-/server-negotiated transport compression, as suggested in opencontainers/distribution-spec#235 and linked issues/PRs such as containerd/containerd#8166 and distribution/distribution#3900), as this solves more interesting problems than simply swapping out gzip for zstd, but I do not think it's appropriate to block zstd adoption here entirely based on that opinion (especially since that idea gaining traction would just as easily mean dropping zstd as it does dropping gzip).

In order to make the determination as to whether we're past the "adoption curve" sufficiently to feel good about changing our defaults (ie, to shrink the list of stragglers who will come and complain about their outdated software no longer working with our images), I am collecting here a list of popular runtimes/tools, and the version/date that they added zstd support.

Runtime/Tool	Versions	Dates	Repology
Docker/Moby	23+	2023-02-01	https://repology.org/project/docker/badges
containerd	1.5+	2021-05-03	https://repology.org/project/containerd/badges
crane/go-containerregistry	0.13+	2023-01-24
containers/image (library)	4+	2019-10-01
podman	1.6.2+	2019-10-19	https://repology.org/project/podman/badges
buildah	1.11.3+	2019-10-04	https://repology.org/project/buildah/badges
skopeo	0.1.40+	2019-10-29	https://repology.org/project/skopeo/badges
cri-o	1.17+	2020-02-07	https://repology.org/project/cri-o/badges
Singularity	3.6+ ?	2020-07-14	https://repology.org/project/singularity-container/badges

Additionally, I am compiling a companion table of distribution releases and their relevant EOL dates in cases where they have runtimes/tools that are not recent enough to have zstd support.

Runtime/Tool	Distro	Version	EOL Date
Docker	Debian Trixie (13)	26+
	Bookworm (12)	20.10	2028-06-30
	Bullseye (11)	20.10	2026-08-31
	Buster (10)	18.09	2024-06-30
	Ubuntu Noble (24.04)	24+	2029-06
	Jammy (22.04)	24+	2027-06
	Focal (20.04)	24+	2025-04
containerd	Debian Trixie (13)	1.7+
	Bookworm (12)	1.6
	Bullseye (11)	1.4
	Buster (10)	-
	Ubuntu Noble (24.04)	1.6
	Jammy (22.04)	1.6
	Focal (20.04)	1.6

(The more I compile this table, the less happy with it I am 😭 I can't figure out a clean way to organize it that still includes all the data I need it to, but isn't so heinously verbose or superwide.)

See also:

tianon · 2024-10-30T16:41:09Z

To reiterate the conclusion I've just posted on moby/moby#48328 (comment) here (with the DOI slant), I think those Debian Stable and Oldstable (Bookworm and Bullseye) versions still being stuck on 20.10 (and thus having no zstd support) means it's going to be several more years yet before DOI can reasonably consider switching to zstd -- at the very least, we'd need the release of Debian Trixie (which has a new enough version), but ideally a bit further into the Trixie support window before we consider dropping support for Bookworm users (maybe even all the way to that 2028 date, but at the very least likely the 2026 date of Bullseye's EOL 😭).

Perhaps this is useful context for why I think uncompressed canonical layers is a better solution for the industry overall -- then registries and runtimes can decide which compression they want to use for storage (making whatever appropriate speed/space/CPU tradeoffs for their own situation), and can coordinate at the transport level which compression to use over-the-wire based on what they're willing to support. Current users would all be mostly unaffected because the "failure mode" is simply that layers are bigger than they used to be, but everything still works fine otherwise (so now upgrading is a very compelling carrot vs a stick/broken workflows). That would also resolve some of the interesting struggles around layer IDs vs "diff IDs" that the image specification has today. Doing "random access" from the registry API would then be trivial range requests across the tar (needing only a map of tar entries to offsets vs complex compression coordination). 🤷

(To put all of this another way, I don't think zstd solves enough "interesting" problems to be worth breaking so many users at the present time. ❤️)

cgwalters · 2024-10-31T15:40:42Z

Perhaps this is useful context for why I think uncompressed canonical layers is a better solution for the industry overall

Yes, I agree overall. When I made ostree it was very intentionally inspired by git, which does not checksum compressed content and so neither have this problem.

(tangential) One thing related to this is I've been having a vague thought that we could try to enourage signing tools (which commonly sign the manifest today) to also have a secondary signature that covers only the uncompressed content that can also be carried around (I think it'd just be the manifest without the compressed checksum layer IDs?...gets into issues around canonical json) etc. if we need to substitute values reproducibly).

tianon mentioned this issue Oct 30, 2024

distribution: Set default compression to zstd/fastest moby/moby#48328

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopting Zstandard/"zstd" by default #17720

Adopting Zstandard/"zstd" by default #17720

tianon commented Oct 14, 2024 •

edited

Loading

tianon commented Oct 30, 2024

cgwalters commented Oct 31, 2024

Adopting Zstandard/"zstd" by default #17720

Adopting Zstandard/"zstd" by default #17720

Comments

tianon commented Oct 14, 2024 • edited Loading

tianon commented Oct 30, 2024

cgwalters commented Oct 31, 2024

tianon commented Oct 14, 2024 •

edited

Loading