Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Release Automation & Helm Deployment for KGateway #10441

Open
4 of 24 tasks
timflannagan opened this issue Jan 13, 2025 · 7 comments
Open
4 of 24 tasks

[Epic] Release Automation & Helm Deployment for KGateway #10441

timflannagan opened this issue Jan 13, 2025 · 7 comments

Comments

@timflannagan
Copy link
Member

timflannagan commented Jan 13, 2025

Motivation

As we continue to bootstrap and finalize the repository transition from the solo-io organization to the new kgateway organization, it's critical to establish a release and deployment process for the new project. The main goals here are two-fold:

  1. Create a release workflow/automation that builds our Go binaries into multi-arch container images.
  2. Enable users to configure and deploy the project, including these published container images (e.g. via Helm).

This will ensure smoother releases, easier user adoption, and a cleaner, more maintainable codebase.

Task List

Needs Grooming

  • GHCR cleanup & registry housekeeping
  • Makefile & devexp cleanup
  • Decide on splitting up nightly and regular release registries
  • Dependency management and scanning (e.g. trivy, dependabot, renovate)
  • Publish release-specific, latest images, invariants, etc.
  • Add helm tests for the new chart
  • Determine whether cache restoration is needed for goreleaser job
  • Re-evaluate whether we want to stick with the v0.0.0-main semver hack for dev builds
  • Determine any gaps with GORELEASER_CURRENT_TAG approach needed for main/nightly releases
  • Determine the right flow for publishing a real release: workflow_dispatch with version, or pushing a tag locally, or via the release tab, etc.
  • Evaluate whether the docker/setup-qemu-action and docker/setup-buildx-action steps are both needed in the goreleaser job
  • Refactor the values.yaml organization, particularly the gateway class and parameters
  • Fix the default GatewayParameters template to include the sds and ai extension containers

Misc.

Related to #10496 as purging legacy code will result in smaller binaries sizes, quicker deploy times, and a smaller CVE surface for published images.

@lgadban
Copy link
Contributor

lgadban commented Jan 28, 2025

track trivy or security scan automation

@lgadban
Copy link
Contributor

lgadban commented Jan 29, 2025

@timflannagan the GE release setup had different registries/repos for full-fledged releases vs. dev artifacts (e.g. nightlies)
With the current goreleaser model, they will all go to the standard kgateway-dev GHCR correct?
Do we need to consider the split model at all?

@timflannagan
Copy link
Member Author

@lgadban yep, the current approach will all live in the kgateway-dev ghcr.io. there's some warrant in splitting them up, but I wonder whether it's too premature given the state of the project at the moment.

@lgadban
Copy link
Contributor

lgadban commented Jan 29, 2025

... yep, the current approach will all live in the kgateway-dev ghcr.io. there's some warrant in splitting them up, but I wonder whether it's too premature given the state of the project at the moment.
That's fair, I'm wondering if there's anything we would want to change in the current design in case we want to make that split in the future?

@timflannagan
Copy link
Member Author

@lgadban I think I'd like to land #10540 in main first before changing the approach. From my perspective, getting signal things are working with the simplistic approach is good and we can split it up before we cut an initial release.

In theory, it's a minimal # of changes splitting up the registry for nightly artifacts vs. released/stable artifacts and that leaves us with a couple of options:

  • Dice them up before the initial release
  • Dice them up after, but do it early in the release lifecycle to limit the migration pain for users that depends on nightly artifacts

WYDT?

@timflannagan
Copy link
Member Author

Quick update on the current state of this work:

  • We removed the gloo helm chart in favor of a new kgateway chart that removed all the baggage/tech debt from the previous chart. This chart is being deployed in CI and published in the release workflow
  • We have main branch container images and helm charts published to this repositories' ghcr.io registry. We decided on using v0.0.0-main for the tagging convention -- largely as a workaround as helm charts need a valid semver and this was the easiest approach to test the new release pipeline, without committing too much to a formal tagging convention (i.e. maintain some flexibility).
  • A substantial cleanup that reduces the various container image sizes. This helps reduce the CVE surface in addition to quicker deploy times. See [Epic] Cleanup Repo #10496 for the tracker of that work stream.

There's a couple of misc. items that we'd still live to accomplish while we're here overhauling the release process:

  • Add OCI labels/annotations to our published container images. This is a best practice, and particularly useful for main branch published images
  • Re-evaluate the current docker manifest tags (e.g. publish :latest, or invariants like debug containers, etc.)
  • Rethink the legacy gloo changelog management process. This introduced some devex pains that we'd like to overhaul while we have the opportunity to do. See Programmtic Changelog Checking, Github #10381 and Add EP-10381: Overhauling changelog process #10422 for some work being done in that realm. Additionally, the new release piipeline allows us to inject/pipe custom release notes into the published release notes without any friction. This will provide a nice e2e for the initial release, but isn't strictly necessary or blocks an initial release
  • Any additional automation to aim project maintenance, e.g. mirroring quay.io automatic purging with ghcr.io, that might be useful as the project pumps out more releases over time
  • Evaluate the current base images. Do we want to standardize on distroless going forward?
  • Any project documentation that will aid maintainers (e.g. how to cut a release) or set expectations with the community around the release process (e.g. release cadence, backport policy, etc.). This isn't something that's strictly necessary in the very short term, but it would be good to be pragmatic about this while it's fresh in the mind.

@danehans
Copy link
Contributor

danehans commented Feb 6, 2025

I can't add #10586 to the above task list. @timflannagan can you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Epics
Development

No branches or pull requests

4 participants