Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(module name): (short issue description) #1053

Open
NicoleY666 opened this issue Aug 3, 2024 · 2 comments
Open

(module name): (short issue description) #1053

NicoleY666 opened this issue Aug 3, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@NicoleY666
Copy link

NicoleY666 commented Aug 3, 2024

Describe the bug

Here is the addon deployment:

const region = "ap-southeast-2"
const karpenterAddOn = new blueprints.addons.KarpenterAddOn({
version: '0.35.5',
nodePoolSpec: {
requirements: [
{ key: 'node.kubernetes.io/instance-type', operator: 'In', values: ["c5d.4xlarge","c6a.2xlarge","c6a.4xlarge","c6a.8xlarge","c6a.16xlarge"] },
{ key: 'topology.kubernetes.io/zone', operator: 'In', values: [${region}a, ${region}b, ${region}c] },
{ key: 'kubernetes.io/arch', operator: 'In', values: ['amd64','arm64']},
{ key: 'karpenter.sh/capacity-type', operator: 'In', values: ['on-demand']},
],
disruption: {
consolidationPolicy: "WhenEmpty",
consolidateAfter: "30s",
expireAfter: "72h",
budgets: [{nodes: "10%"}]
}
},
ec2NodeClassSpec: {
amiFamily: "AL2",
subnetSelectorTerms: [{ tags: {"ops:repo":xxxx} }],
securityGroupSelectorTerms: [{ tags: {"aws:eks:cluster-name": 'xxxxx'} }],
},
interruptionHandling: true,
podIdentity: false,
});

const addOns: Array<blueprints.ClusterAddOn> = [
  new blueprints.addons.CalicoOperatorAddOn(),
  new blueprints.addons.MetricsServerAddOn(),
  new blueprints.addons.AwsLoadBalancerControllerAddOn({
    enableWaf: false,
    version: mapping[env].helmChartVersion,
  }),
  new blueprints.addons.VpcCniAddOn(),
  new blueprints.addons.CoreDnsAddOn(),
  new blueprints.addons.KubeProxyAddOn(),
  new blueprints.addons.SSMAgentAddOn(),
  new blueprints.addons.CloudWatchInsights(),
  karpenterAddOn
];

Attach with the new node without deployments

Screenshot 2024-08-03 at 9 53 23 PM

image

Expected Behavior

I want the karpenter scale up and down automatically as a smart autoscaler

Current Behavior

There has three node at the same time
NAME↑ STATUS ROLE TAINTS VERSION PODS CPU MEM %CPU %MEM CPU/A MEM/A AGE
ip-10-60-63-193.ap-southeast-2.compute.internal Ready 0 v1.29.3-eks-ae9a62a 14 113 1866 0 6 15890 28360 32h
ip-10-60-74-255.ap-southeast-2.compute.internal Ready 0 v1.29.3-eks-ae9a62a 16 130 1755 0 6 15890 28360 31h
ip-10-60-95-42.ap-southeast-2.compute.internal Ready 3 v1.29.6-eks-1552ad0 9 0 0 0 0 7910 14640 8h

The third node is not scale down as expected, even thought there has no new deployment. when I checked the pod in the third node. there has a calico-system calico-typha-988d6c9c5-fh55r (which is not a daemonset), which is blocking the karpenter scale down the node. but this pod is deployed by CalicoOperatorAddOn().
which create three pods
calico-system calico-node-jnx6c. (daemonsets)
calico-system calico-typha-988d6c9c5-fh55r (deployment)
calico-system csi-node-driver-ld48d (daemonsets)

As the calico-typha is created by addon, i don't know how to make the karpenter scale down as expected.

Reproduction Steps

the code is deployed above.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.147.3

EKS Blueprints Version

1.15.1

Node.js Version

20

Environment details (OS name and version, etc.)

EKS

Other information

No response

@NicoleY666 NicoleY666 added the bug Something isn't working label Aug 3, 2024
@shapirov103
Copy link
Collaborator

@NicoleY666

CalicoOperator addon deploys the operator only. That component then deploys the pods that you mentioned. This functionality is controlled by calico.

Let me understand the issue: you have a node with calico pods running. Your screenshots show the pods running on ip-10-60-95-42. Is that the node that you want to scale down or are there any nodes with no pods which are not scaled down?

Calico CNI is not a component that we support functionally. While we support provisioning of that component (operator), the actual software is maintained by the Calico community (or Tigera for enterprise support). In general, CNI components are considered to be mission critical and may have specific disruption rules applied.

@NicoleY666
Copy link
Author

Yes. You are correct, but the pod is blocking the node scale down as the node is not empty (except daemonset), is there a way that we can except some deployment and make it scale down? As the policy is whenempty (so if only daemonset pod running in the node, then the node can be scale down. However, the pod is created as deployment which is control by calicoOperator addon, I cant make it as daemonset. As the pod calico-typha in the node, it can't mark it as empty to scale down. If you can see the screenshot attach, it shouldn't have 4 node with less than 20% usage.

image
image

but it also have the node with daemonset only not scale down as expected, not quite sure why
image

here is the daemonset screenshot
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants