Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context with timeout #4007

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

context with timeout #4007

wants to merge 3 commits into from

Conversation

dsxing
Copy link

@dsxing dsxing commented Sep 5, 2023

We set a timeout of 30 seconds when doing stopUnit/startUnit, so we may need to pass the timeout args with the context

@kolyshkin
Copy link
Contributor

@dsxing and how do you think the two timeouts will interact?

@lifubang
Copy link
Member

lifubang commented Sep 6, 2023

Maybe relates to #3904 (#3904 (comment))
I agree with cyphar's opinion.

@dsxing
Copy link
Author

dsxing commented Sep 7, 2023

@dsxing and how do you think the two timeouts will interact?

The two timeouts are set to the same value with the purpose of halting systemd operations when the original opencontainers/runc logic returns a timeout error. This is because after runc returns an error, kubelet will perform a re-sync and call opencontainers/runc again to start the task.

@dsxing
Copy link
Author

dsxing commented Sep 7, 2023

Maybe relates to #3904 (#3904 (comment)) I agree with cyphar's opinion.

I encountered a scenario where there was a memory leak in kubelet, due to systemd hanging, the kubelet logs showed a significant number of "Failed to delete cgroup paths" and "Timed out while waiting for systemd to remove..." messages. Upon analyzing the kubelet stack traces, I found that the opencontainers/runc was making calls to the go-systemd library, and there were numerous deadlocked goroutines in go-systemd. Therefore, I am wondering if it's possible to pass the timeout information from runc to go-systemd to cancel go-systemd tasks, thus avoiding the accumulation of goroutines that could lead to a memory leak in kubelet. Note that systemd cgroup v1 is being used.

@lifubang
Copy link
Member

lifubang commented Sep 9, 2023

@dsxing Please signed off your commit first.

git rebase HEAD~1 --signoff
git push -f

next time, please use git commit -s to commit your changes.

Signed-off-by: dsxing <[email protected]>
@dsxing
Copy link
Author

dsxing commented Sep 9, 2023

@dsxing Please signed off your commit first.

git rebase HEAD~1 --signoff
git push -f

next time, please use git commit -s to commit your changes.

ok, it's done.

@dsxing dsxing requested a review from AkihiroSuda September 24, 2023 08:36
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns (or, maybe, questions) here are:

  1. How does two 30-second timeouts interact (as they are to be hit at about the same time)?
  2. How does a newly added context interacts with retryOnDisconnect?
  3. Why do we need two timeouts? Maybe it's better to replace the old one with the one from context?

Without a good understanding of these topics I'm hesitant to approve this.

@dsxing
Copy link
Author

dsxing commented Oct 24, 2023

My concerns (or, maybe, questions) here are:

  1. How does two 30-second timeouts interact (as they are to be hit at about the same time)?
  2. How does a newly added context interacts with retryOnDisconnect?
  3. Why do we need two timeouts? Maybe it's better to replace the old one with the one from context?

Without a good understanding of these topics I'm hesitant to approve this.

I also think we should use the same timeout parameter. Additionally, I understand that the role of retryOnDisconnect is to reconnect when the DBus connection is lost, and it's not directly related to the results of the function call, which are reported back in the 'err' parameter.

When err is nil, the final determination of the call's result will be made through statusChan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants