error: cannot start pipelines: cannot get pod from kubelet #1417

GotoRen · 2024-11-11T03:59:00Z

Describe the bug

After upgrading EKS nodes to version v1.29 (v1.29.8-20241024) and deploying CloudWatch Agent v1.3xxx, the following error is encountered:

2024-11-06T06:49:35Z I! Starting AmazonCloudWatchAgent CWAgent/1.300028.1b210 (go1.20.7; linux; amd64)
2024-11-06T06:49:35Z I! AWS SDK log level not set
2024-11-06T06:49:35.353Z	info	service/telemetry.go:96	Skipping telemetry setup.	{"address": "", "level": "None"}
2024-11-06T06:49:35.356Z	info	service/service.go:131	Starting CWAgent...	{"Version": "1.300028.1b210", "NumCPU": 4}
2024-11-06T06:49:35.356Z	info	extensions/extensions.go:30	Starting extensions...
2024-11-06T06:49:35.374Z	info	host/ec2metadata.go:89	Fetch instance id and type from ec2 metadata	{"kind": "receiver", "name": "awscontainerinsightreceiver", "data_type": "metrics"}
2024-11-06T06:49:35.383Z	info	service/service.go:157	Starting shutdown...
2024-11-06T06:49:35.383Z	info	extensions/extensions.go:44	Stopping extensions...
2024-11-06T06:49:35.383Z	info	service/service.go:171	Shutdown complete.
Error: cannot start pipelines: cannot get pod from kubelet, err: call to /pods endpoint failed: Get "https://<host_ip>:10250/pods": remote error: tls: internal error
2024-11-06T06:49:35Z E! [telegraf] Error running agent: cannot start pipelines: cannot get pod from kubelet, err: call to /pods endpoint failed: Get "https://<host_ip>:10250/pods": remote error: tls: internal error

amazon-cloudwatch-agent/internal/k8sCommon/kubeletutil/kubeletclient.go

Line 35 in 6b25891

url := fmt.Sprintf("https://%s:%s/pods", k.KubeIP, k.Port)

Note that the EKS Control-Plane was upgraded to v1.29 before proceeding with the node upgrade.

Steps to reproduce

At first, I upgraded the EKS cluster from version v1.28 to v1.29.
Then, I upgraded the node version from v1.27 to v1.29.

The reason for skipping one version is that I alternate between Blue and Green nodes.

After upgrading the node version to v1.29, the CloudWatch Agent started producing the aforementioned error.

What did you expect to see?

As a result of the cluster upgrade, the CloudWatch Agent is expected to no longer output errors. Specifically, when the CloudWatch Agent sends a request to the /pods endpoint on a running instance to retrieve pod data, the TLS error (tls: internal error) is expected not to occur.

What version did you use?

Control-Plane: v1.29
Data-Plane (EKS node): v1.29.8-20241024
- kubelet: v1.29.8-eks-a737599
CloudWatch Agent: v1.300028.1b210

What config did you use?

We are using the container image public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.300028.1b210

Environment

AMI: AL2_x86_64
Instance type: c6i.2xlarge
OS architecture: linux (amd64)
OS image: Amazon Linux 2

※ IMDSv2 is optional (= disabled).

Additional comment

cannot get pod from kubelet, err: call to /pods endpoint failed: #1100

A similar issue has been observed, but it remains unresolved. This error seems to occur even when IMDSv2 is enabled.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error: cannot start pipelines: cannot get pod from kubelet #1417

error: cannot start pipelines: cannot get pod from kubelet #1417

GotoRen commented Nov 11, 2024 •

edited

Loading

error: cannot start pipelines: cannot get pod from kubelet #1417

error: cannot start pipelines: cannot get pod from kubelet #1417

Comments

GotoRen commented Nov 11, 2024 • edited Loading

GotoRen commented Nov 11, 2024 •

edited

Loading