Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: upgrade all control plane nodes first #3444

Open
wants to merge 9 commits into
base: burgerdev/k8s-1.31
Choose a base branch
from

Conversation

3u13r
Copy link
Member

@3u13r 3u13r commented Oct 20, 2024

Context

Allow #3396. Since kubelets must not communicate with a KubeAPI Server that has an older version than the kubelet itself, we need to upgrade all control planes, before upgrading the worker nodes. Control plane nodes are configured in a way that they only talk to the local KubeAPI server that matches the kubelet version.

Proposed change(s)

  • Allow increasing the node budget: This is technically a full feature that I thought we already had and I needed it for the env test that tries to upgrade the worker nodes so I can verify that the control plane pending node is created but not the worker node. This should fail because of our explicit check and node the missing node budget.
  • Generally improve test coverage of the nodeversion env test since we are upgrading 2 nodes in different scaling groups now.
  • Don't call out to the cloud provider API to create new worker nodes if there are still control planes in the outdated or donor category, even if we would have enough node budget to do so.
  • Since the operator code gen make target didn't work for me, I bumped the version (this seemingly was needed so that this plays nicely with go workspaces), reverted the make target scripts back to their original form and manually copied over the newly generated file to the cli helm embedding (we might want to automate this in the future when we use bazel for all parts of the operator).

How to test:

bazel test --test_output=all --cache_test_results=no //operators/constellation-node-operator/controllers:controllers_test

Related issue

kubernetes/kubernetes#127316

Checklist

  • Link to Milestone

@3u13r 3u13r added the no changelog Change won't be listed in release changelog label Oct 20, 2024
@3u13r 3u13r requested a review from derpsteb as a code owner October 20, 2024 20:48
@3u13r 3u13r requested a review from burgerdev October 20, 2024 20:49
@3u13r 3u13r force-pushed the euler/feat/operator/upgrade-control-planes-first branch from 39f28da to 5000cc9 Compare October 20, 2024 20:54
@3u13r 3u13r changed the title Euler/feat/operator/upgrade control planes first operator: upgrade all control plane node first Oct 20, 2024
@3u13r 3u13r force-pushed the euler/feat/operator/upgrade-control-planes-first branch from 5000cc9 to 14e650b Compare October 20, 2024 21:27
Only use feature flag on K8s versions that support that feature flag.
Otherwise Constellation throws an error during init.
Also fixup some tests.
This bump the controller gen version and also adjusts the generate commands (back to the original ones). This allows correct generation of CRDs and go code.
@3u13r 3u13r force-pushed the euler/feat/operator/upgrade-control-planes-first branch from 14e650b to 5fcf88f Compare October 20, 2024 22:14
@3u13r 3u13r force-pushed the euler/feat/operator/upgrade-control-planes-first branch 2 times, most recently from a612103 to d0e8a38 Compare October 20, 2024 23:57
Comment on lines +57 to +61
maxNodeBudget:
description: MaxNodeBudget is the maximum number of nodes that can
be created simultaneously.
format: int32
type: integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this problematic for upgrades, since we never upgrade the installed CRDs through helm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think helm has some upgrade magic build in regarding CRDs. But this time we only add one field that is optional, we are fine. Proof: https://github.com/edgelesssys/constellation/actions/runs/11455280858/job/31871905456.
Or do you have a concrete error in mind, I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, Helm simply doesn't do anything if a CRD already exists: https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#method-1-let-helm-do-it-for-you

So if we change the CRDs, running helm apply won't actually apply these changes.
From what I can tell, this effectively makes the maxNodeBudget option in the nodeversion crd non-functional.
I'm assuming everything still works fine because we default to 1 if the value is not set in the CR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks for the hint. Yes, the behavior is (hopefully) exactly the same. I think we are good for now then (also since the e2e test passed). I didn't want to advertise this feature anyway, but even if we do we should add this constraint.
In the future(tm), we likely want to do what others do (e.g., istio: https://istio.io/latest/docs/setup/upgrade/helm/#canary-upgrade-recommended).

@3u13r 3u13r force-pushed the euler/feat/operator/upgrade-control-planes-first branch from d0e8a38 to a8ec800 Compare October 22, 2024 09:37
Before we call out ot the cloud provider we check if there are still control plane nodes that are outdated (or donors). If there are, we don't create any worker nodes, even if we have the budget to do so.
@3u13r 3u13r force-pushed the euler/feat/operator/upgrade-control-planes-first branch from a8ec800 to 72f85f3 Compare October 22, 2024 10:47
Copy link
Contributor

Coverage report

Package Old New Trend
bootstrapper/internal/kubernetes/k8sapi 13.10% 12.60% ↘️
internal/constellation/kubecmd 62.40% 24.90% ↘️
internal/versions 9.70% 8.70% ↘️
operators/constellation-node-operator 0.00% 0.00% 🚧
operators/constellation-node-operator/api/v1alpha1 0.00% 0.00% 🚧
operators/constellation-node-operator/controllers 30.80% 26.20% ↘️
operators/constellation-node-operator/internal/constants [no test files] [no test files] 🚧
operators/constellation-node-operator/internal/controlplane 100.00% 5.70% ↘️
operators/constellation-node-operator/internal/etcd 65.80% 9.70% ↘️
operators/constellation-node-operator/internal/node 100.00% 9.20% ↘️

@3u13r 3u13r changed the title operator: upgrade all control plane node first operator: upgrade all control plane nodes first Oct 22, 2024
@@ -19,4 +19,6 @@ const (
PlaceholderControlPlaneScalingGroupName = "control-planes-id"
// PlaceholderWorkerScalingGroupName name of the worker scaling group used if upgrades are not yet supported.
PlaceholderWorkerScalingGroupName = "workers-id"
// ControlPlaneRoleLabel label used to identify control plane nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ControlPlaneRoleLabel label used to identify control plane nodes.
// ControlPlaneRoleLabel label used to identify control plane nodes.
// https://kubernetes.io/docs/reference/labels-annotations-taints/#node-role-kubernetes-io-control-plane

Nit, just to document that this is canonical.

Comment on lines 837 to 838
r.RLock()
defer r.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a write lock now?

@@ -21,6 +21,8 @@ type NodeVersionSpec struct {
KubernetesComponentsReference string `json:"kubernetesComponentsReference,omitempty"`
// KubernetesClusterVersion is the advertised Kubernetes version of the cluster.
KubernetesClusterVersion string `json:"kubernetesClusterVersion,omitempty"`
// MaxNodeBudget is the maximum number of nodes that can be created simultaneously.
MaxNodeBudget uint32 `json:"maxNodeBudget,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against making this a user-facing feature for now, because we did not discuss its semantics sufficiently. What if the user sets it to 1000 - do we replace all control-plane nodes at once? Should we have different budgets for control planes and for workers? Could this be relative? Should we rather implement a different upgrade algorithm (3533 etc)? I'm also reminded of the evolution of https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.30/#rollingupdatedeployment-v1-apps.

Afaict this PR would be much smaller if we removed this feature and tried to find a different way to test it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we replace all control-plane nodes at once?

Yes, I also tested it setting the budget to 5 when I had 3:2 nodes. But "at-once" only applies to the world how the operator sees it. The join service still forbids multiple control-plane nodes joining at the same time. Note that this is the scenario right after initializing a Constellation with >=3 control planes. Also replace means, adding the nodes first. So in theory you could go from 3 control planes to 6 since the operator only removes a node once the hand over is finished.

Should we have different budgets for control planes and for workers?

We might do that in the future, if a customer requires it or we think we need it.

Could this be relative?

I assume as in a percentage value. Sure, but this is more difficult/complex then setting the number of nodes.

Should we rather implement a different upgrade algorithm

I think the bug is orthogonal to this proposal, since I'm not reworking the operator replacement algorithm, which would be a large undertaking in my opinion.

I'm also reminded of the evolution of https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.30/#rollingupdatedeployment-v1-apps

I don't know the evolution, is there a summary somewhere of what happened?

Note that also the operator API is on version v1alpha1, so in my opinion, we don't have to provide any API stability guarantees between Constellation versions and we can completely change the whole upgrade process and APIs between Constellation versions.

Some of the current fields are not really user-facing in a sense that the user should directly use them. Changing the image reference requires 1. the image reference to be one of our images references 2. the measurements in the join config to match the image. Changing the k8s version requires a config map under that name and for the Constellation to upgrade correctly it has to contain the right set of components and patches.

Afaict this PR would be much smaller if we removed this feature and tried to find a different way to test it.

Then I'll have another try at the test for this, but this might take a bit of time before I get back to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog Change won't be listed in release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants