Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallelism in deploy #105

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

tianon
Copy link
Member

@tianon tianon commented Jan 23, 2025

This requires the input to be in the correct order (children first), but that was already an implied constraint.

In my testing, this takes our .test/test.sh --deploy block testing deploy from ~5s down to ~2.5s.

I used "show timestamps" in our GHA CI jobs to compare them also, and they're currently at ~7s, and after this change they drop to ~1s.

I've tested this with my personal deploy job, which was taking ~2m40s before, and is ~1m18s after.

@tianon tianon requested a review from a team as a code owner January 23, 2025 20:32
@tianon tianon force-pushed the parallel-deploy branch 2 times, most recently from e9539d5 to f727fb0 Compare January 23, 2025 21:24
@tianon
Copy link
Member Author

tianon commented Jan 23, 2025

Coverage Diff: (lightly munged so it's easier to see actual change vs line numbers, etc)
--- before.txt	2025-01-23 14:09:09.231284131 -0800
+++ after.txt	2025-01-23 14:14:40.198581807 -0800
@@ -3,10 +3,12 @@
 cmd/builds/main.go:			loadCacheFromFile		76.0%
 cmd/builds/main.go:			saveCacheToFile			78.6%
 cmd/builds/main.go:			main				80.7%
+cmd/deploy/input.go:			clone				100.0%
 cmd/deploy/input.go:			normalizeInputRefs		100.0%
 cmd/deploy/input.go:			normalizeInputLookup		100.0%
-cmd/deploy/input.go:			NormalizeInput			80.0%
-cmd/deploy/main.go:			main				77.1%
+cmd/deploy/input.go:			NormalizeInput			79.1%
+cmd/deploy/input.go:			do				87.5%
+cmd/deploy/main.go:			main				92.8%
 cmd/lookup/main.go:			main				74.6%
 om/om.go:				Keys				100.0%
 om/om.go:				Get				100.0%
@@ -26,12 +28,13 @@
 registry/cache.go:			ResolveBlob			100.0%
 registry/cache.go:			ResolveTag			84.6%
 registry/cache.go:			PushManifest			100.0%
-registry/cache.go:			PushBlob			72.2%
+registry/cache.go:			PushBlob			94.4%
 registry/cache.go:			MountBlob			94.1%
 registry/client.go:			Client				56.2%
 registry/client.go:			EntryForRegistry		86.7%
 registry/lookup.go:			Lookup				87.5%
-registry/push.go:			EnsureManifest			78.9%
+registry/manifest-children.go:	ParseManifestChildren		100.0%
+registry/push.go:			EnsureManifest			73.2%
 registry/push.go:			CopyManifest			71.4%
 registry/push.go:			EnsureBlob			77.8%
 registry/push.go:			CopyBlob			65.5%
@@ -46,4 +49,4 @@
 registry/synthesize-index.go:		SynthesizeIndex			81.1%
 registry/synthesize-index.go:	setRefAnnotation		100.0%
 registry/synthesize-index.go:	normalizeManifestPlatform	83.9%
-total:					(statements)			79.6%
+total:					(statements)			80.9%

Edit: again, it's a time-limited link (thanks GHA!), but https://gha.dag.dev/trace?artifact=2477631176&file=coverage.html&name=coverage&size=52560&uri=https%3A%2F%2Fgithub.com%2Fdocker-library%2Fmeta-scripts%2Factions%2Fruns%2F12939026432%2Fjob%2F36090462795 will let you browse said coverage as HTML 👍

This requires the input to be in the correct order (children first), but that was already an implied constraint.

In my testing, this takes our `.test/test.sh --deploy` block testing deploy from ~5s down to ~2.5s.

I used "show timestamps" in our GHA CI jobs to compare them also, and they're currently at ~7s, and after this change they drop to ~1s.

I've tested this with my personal deploy job, which was taking ~2m40s before, and is ~1m18s after.
@tianon
Copy link
Member Author

tianon commented Jan 23, 2025

Hacked in a test in prod, and it's slower 😭

https://doi-janky.infosiftr.net/job/meta/view/deploys/job/ppc64le/job/deploy/11381/console

(~16m job is already at ~19m and still going)

@tianon
Copy link
Member Author

tianon commented Jan 23, 2025

Ok, that was actually my bad, but it made the test more useful. What I did was accidentally run a test of this without DOCKERHUB_PUBLIC_PROXY, which shows us that not only is it still relevant, it's actually crucial to our speed.

I ran another test of this that did use the proxy, and it was ~10m30s.

So to summarize:

  • before: ~16m
  • after: ~10m30s
  • without proxy: ~28m

@tianon
Copy link
Member Author

tianon commented Jan 24, 2025

(we should merge this first thing tomorrow, not tonight)

@tianon tianon merged commit e25ff36 into docker-library:main Jan 24, 2025
1 check passed
@tianon tianon deleted the parallel-deploy branch January 24, 2025 17:22
@tianon
Copy link
Member Author

tianon commented Jan 31, 2025

Now that we've changed some things on the backend, this should be reasonable to reintroduce (reverted in #107, made even safer to reintroduce by #106).

That should definitely wait until at least next Monday, though. I'll get a PR open for it so we have a better place to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants