Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: cannot start pipelines: cannot get pod from kubelet #1417

Open
GotoRen opened this issue Nov 11, 2024 · 0 comments
Open

error: cannot start pipelines: cannot get pod from kubelet #1417

GotoRen opened this issue Nov 11, 2024 · 0 comments

Comments

@GotoRen
Copy link

GotoRen commented Nov 11, 2024

Describe the bug

After upgrading EKS nodes to version v1.29 (v1.29.8-20241024) and deploying CloudWatch Agent v1.3xxx, the following error is encountered:

2024-11-06T06:49:35Z I! Starting AmazonCloudWatchAgent CWAgent/1.300028.1b210 (go1.20.7; linux; amd64)
2024-11-06T06:49:35Z I! AWS SDK log level not set
2024-11-06T06:49:35.353Z	info	service/telemetry.go:96	Skipping telemetry setup.	{"address": "", "level": "None"}
2024-11-06T06:49:35.356Z	info	service/service.go:131	Starting CWAgent...	{"Version": "1.300028.1b210", "NumCPU": 4}
2024-11-06T06:49:35.356Z	info	extensions/extensions.go:30	Starting extensions...
2024-11-06T06:49:35.374Z	info	host/ec2metadata.go:89	Fetch instance id and type from ec2 metadata	{"kind": "receiver", "name": "awscontainerinsightreceiver", "data_type": "metrics"}
2024-11-06T06:49:35.383Z	info	service/service.go:157	Starting shutdown...
2024-11-06T06:49:35.383Z	info	extensions/extensions.go:44	Stopping extensions...
2024-11-06T06:49:35.383Z	info	service/service.go:171	Shutdown complete.
Error: cannot start pipelines: cannot get pod from kubelet, err: call to /pods endpoint failed: Get "https://<host_ip>:10250/pods": remote error: tls: internal error
2024-11-06T06:49:35Z E! [telegraf] Error running agent: cannot start pipelines: cannot get pod from kubelet, err: call to /pods endpoint failed: Get "https://<host_ip>:10250/pods": remote error: tls: internal error

url := fmt.Sprintf("https://%s:%s/pods", k.KubeIP, k.Port)

Note that the EKS Control-Plane was upgraded to v1.29 before proceeding with the node upgrade.

Steps to reproduce

At first, I upgraded the EKS cluster from version v1.28 to v1.29.
Then, I upgraded the node version from v1.27 to v1.29.

The reason for skipping one version is that I alternate between Blue and Green nodes.

After upgrading the node version to v1.29, the CloudWatch Agent started producing the aforementioned error.

What did you expect to see?

As a result of the cluster upgrade, the CloudWatch Agent is expected to no longer output errors. Specifically, when the CloudWatch Agent sends a request to the /pods endpoint on a running instance to retrieve pod data, the TLS error (tls: internal error) is expected not to occur.

What version did you use?

  • Control-Plane: v1.29
  • Data-Plane (EKS node): v1.29.8-20241024
    • kubelet: v1.29.8-eks-a737599
  • CloudWatch Agent: v1.300028.1b210

What config did you use?

We are using the container image public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.300028.1b210

Environment

  • AMI: AL2_x86_64
  • Instance type: c6i.2xlarge
  • OS architecture: linux (amd64)
  • OS image: Amazon Linux 2

※ IMDSv2 is optional (= disabled).

Additional comment

A similar issue has been observed, but it remains unresolved. This error seems to occur even when IMDSv2 is enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant