Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing fields from template doesn't remove it in the resulting resource #11773

Open
cwrau opened this issue Jan 30, 2025 · 6 comments
Open
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@cwrau
Copy link
Contributor

cwrau commented Jan 30, 2025

What steps did you take and what happened?

We have a (kubeadmcontrolplane) template:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
spec:
  template:
    initConfiguration:
      nodeRegistration:
        imagePullPolicy: IfNotPresent
        kubeletExtraArgs:
          cloud-provider: external
        name: '{{ local_hostname }}'
      patches:
        directory: /etc/kubernetes/patches
    joinConfiguration:
      nodeRegistration:
        imagePullPolicy: IfNotPresent
        kubeletExtraArgs:
          cloud-provider: external
        name: '{{ local_hostname }}'

before the change it was like this:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
spec:
  template:
    initConfiguration:
      localAPIEndpoint: {}
      nodeRegistration:
        imagePullPolicy: IfNotPresent
        kubeletExtraArgs:
          cloud-provider: external
          event-qps: "0"
          feature-gates: SeccompDefault=true
          protect-kernel-defaults: "true"
          seccomp-default: "true"
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
        name: '{{ local_hostname }}'
      patches:
        directory: /etc/kubernetes/patches
    joinConfiguration:
      discovery: {}
      nodeRegistration:
        imagePullPolicy: IfNotPresent
        kubeletExtraArgs:
          cloud-provider: external
          event-qps: "0"
          feature-gates: SeccompDefault=true
          protect-kernel-defaults: "true"
          seccomp-default: "true"
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
        name: '{{ local_hostname }}'

as you can see we removed all kubeletExtraArgs aside from cloud-provider.

But the resource, in this case kubeadmcontrolplane, still has these fields set, I assume because of a merge apply instead of completely overriding the resource.

This (might) result in a broken cluster which needs manual intervention if the removed fields are not valid anymore, which is the case for a few of these args.

What did you expect to happen?

That the resource (kubeadmcontrolplane) would be the exact result of the template instead of a merged mess.

Cluster API version

1.8.5

Kubernetes version

1.28.15

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
/area clusterclass

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/clusterclass Issues or PRs related to clusterclass needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 30, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member

How did you apply it? Can you post it in a way we could reproduce it?

Does the behaviour change if you use kubectl apply --server-side=true (on create and on update)

@sbueringer
Copy link
Member

sbueringer commented Jan 30, 2025

We have a (kubeadmcontrolplane) template:

The first YAML shows a KubeadmControlPlane not a KubeadmControlPlaneTemplate

But the resource, in this case kubeadmcontrolplane, still has these fields set, I assume because of a merge apply instead of completely overriding the resource.

We use SSA in the Cluster topology controller. I would have assumed it works

@cwrau
Copy link
Contributor Author

cwrau commented Jan 30, 2025

How did you apply it? Can you post it in a way we could reproduce it?

I apply the cluster-api templates with helm via a flux HelmRelease. Source is https://github.com/teutonet/teutonet-helm-charts/tree/main/charts/t8s-cluster. This is a bit "tricky" to reproduce, as CAPO needs credentials which aren't managed by the cluster-api ecosystem. We have our own operator that fills this gap.

First values:

bastion:
  enabled: false
cloud: ffm3-prod
controlPlane:
  flavor: standard.4.1905
metadata:
  customerID: 1111
  customerName: teuto-net
  serviceLevelAgreement: 24x7
nodePools:
  s2-default:
    availabilityZone: Zone2
    flavor: standard.2.1905
    replicas: 4
version:
  major: 1
  minor: 28
  patch: 15

second values:

bastion:
  enabled: false
cloud: ffm3-prod
controlPlane:
  flavor: standard.4.1905
metadata:
  customerID: 1111
  customerName: teuto-net
nodePools:
  s2-default:
    availabilityZone: Zone2
    flavor: standard.2.1905
    replicas: 4
version:
  major: 1
  minor: 29
  patch: 13

But in principle creating these resources and providing a cluster-api cluster should result in the same problem.

first cluster.yaml:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: mgmt-prod-cluster
spec:
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: mgmt-prod-control-plane
    namespace: cluster-mgmt-prod
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: OpenStackCluster
    name: mgmt-prod-cluster
    namespace: cluster-mgmt-prod
  topology:
    class: mgmt-prod-cluster
    controlPlane:
      metadata:
        labels:
          t8s.teuto.net/cluster: mgmt-prod
          t8s.teuto.net/role: management
      replicas: 3
    variables:
    - name: dnsNameservers
      value:
      - *replace-me
      - *replace-me
    - name: controlPlaneServerGroupID
      value: *replace-me
    - name: machineDeploymentFlavor
      value: compute-plane-placeholder
    version: v1.28.15
    workers:
      machineDeployments:
      - class: compute-plane
        failureDomain: Zone2
        metadata: {}
        name: s2-default
        replicas: 4
        variables:
          overrides:
          - name: machineDeploymentServerGroupID
            value: *replace-me
          - name: machineDeploymentFlavor
            value: standard.2.1905

second cluster.yaml

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: mgmt-prod-cluster
spec:
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: mgmt-prod-control-plane
    namespace: cluster-mgmt-prod
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: OpenStackCluster
    name: mgmt-prod-cluster
    namespace: cluster-mgmt-prod
  topology:
    class: mgmt-prod-cluster
    controlPlane:
      metadata:
        labels:
          t8s.teuto.net/cluster: mgmt-prod
          t8s.teuto.net/role: management
      replicas: 3
    variables:
    - name: dnsNameservers
      value:
      - *replace-me
      - *replace-me
    - name: controlPlaneServerGroupID
      value: *replace-me
    - name: machineDeploymentFlavor
      value: compute-plane-placeholder
    version: v1.29.13
    workers:
      machineDeployments:
      - class: compute-plane
        failureDomain: Zone2
        metadata: {}
        name: s2-default
        replicas: 4
        variables:
          overrides:
          - name: machineDeploymentServerGroupID
            value: *replace-me
          - name: machineDeploymentFlavor
            value: standard.2.1905

Does the behaviour change if you use kubectl apply --server-side=true (on create and on update)

The resulting templates are correct, I don't know what another way of applying (which would result in the same resource on the API server) would change?

We have a (kubeadmcontrolplane) template:

The first YAML shows a KubeadmControlPlane not a KubeadmControlPlaneTemplate

Oh yeah, I fixed the yamls 👍

@sbueringer
Copy link
Member

Just to ensure I understand this correctly.

So you are able to make the changes you want on the KubeadmControlPlaneTemplate and the problem is that the changes are not applied the same way on the KubeadmControlPlane of a Cluster?

@cwrau
Copy link
Contributor Author

cwrau commented Jan 31, 2025

Just to ensure I understand this correctly.

So you are able to make the changes you want on the KubeadmControlPlaneTemplate and the problem is that the changes are not applied the same way on the KubeadmControlPlane of a Cluster?

Yes, or rather (because of forbidden changes 🙄) I delete the template, create another and change the template ref in the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants