Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-leader contour controller pod memory keeps increasing until OOM #6860

Open
Levi080513 opened this issue Jan 16, 2025 · 3 comments · May be fixed by #6872
Open

Non-leader contour controller pod memory keeps increasing until OOM #6860

Levi080513 opened this issue Jan 16, 2025 · 3 comments · May be fixed by #6872
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Levi080513
Copy link

Levi080513 commented Jan 16, 2025

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

  1. Install contour by Helm, the contour.replicaCount is 2, and the envoy service config is this:
envoy:
  service:
    type: LoadBalancer
  1. One of the contour controller pod memory keep increasing until OOM.

What did you expect to happen:

All contour controller pod works well.

Anything else you would like to add:

The pod memory heap (generated by https://github.com/cloudwego/goref):
image

The pod goroutine:
image

Checked the goroutine of the pod, the service events consumer is hangs on ServiceStatusLoadBalancerWatcher.notify.

On non-leader contour controller pod, loadBalancerStatusWriter will not start, it will cause ServiceStatusLoadBalancerWatcher.notify blocking writing to channel.

And then, all service events are cached in pendingNotifications.ringGrowing and will not be consumed. And as new events are generated, more events are cached in pendingNotifications.ringGrowing, and the memory continues to increase, final OOM.

Environment:

  • Contour version: 1.27.0
  • Kubernetes version: (use kubectl version): 1.26.15
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
@Levi080513 Levi080513 added kind/bug Categorizes issue or PR as related to a bug. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. labels Jan 16, 2025
Copy link

Hey @Levi080513! Thanks for opening your first issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on Contour. You can also join us on our mailing list and in our channel in the Kubernetes Slack Workspace

@erwbgy
Copy link
Contributor

erwbgy commented Jan 18, 2025

Contour version v1.27.0 was released in October 2023 so is very old. Using the latest version is usually best but if you have to use v1.27 try v1.27.4 from June 2024 and see if you get the same result.

@tsaarni tsaarni removed the lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. label Jan 20, 2025
@tsaarni
Copy link
Member

tsaarni commented Jan 20, 2025

@Levi080513 Thank you for the report! I can confirm that the issue is reproducible with the latest version.

@tsaarni tsaarni self-assigned this Jan 20, 2025
@tsaarni tsaarni linked a pull request Jan 20, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants