context with timeout #4007

dsxing · 2023-09-05T03:02:15Z

We set a timeout of 30 seconds when doing stopUnit/startUnit, so we may need to pass the timeout args with the context

kolyshkin · 2023-09-06T00:43:06Z

@dsxing and how do you think the two timeouts will interact?

lifubang · 2023-09-06T11:07:02Z

Maybe relates to #3904 (#3904 (comment))
I agree with cyphar's opinion.

dsxing · 2023-09-07T07:45:32Z

@dsxing and how do you think the two timeouts will interact?

The two timeouts are set to the same value with the purpose of halting systemd operations when the original opencontainers/runc logic returns a timeout error. This is because after runc returns an error, kubelet will perform a re-sync and call opencontainers/runc again to start the task.

dsxing · 2023-09-07T07:47:17Z

Maybe relates to #3904 (#3904 (comment)) I agree with cyphar's opinion.

I encountered a scenario where there was a memory leak in kubelet, due to systemd hanging, the kubelet logs showed a significant number of "Failed to delete cgroup paths" and "Timed out while waiting for systemd to remove..." messages. Upon analyzing the kubelet stack traces, I found that the opencontainers/runc was making calls to the go-systemd library, and there were numerous deadlocked goroutines in go-systemd. Therefore, I am wondering if it's possible to pass the timeout information from runc to go-systemd to cancel go-systemd tasks, thus avoiding the accumulation of goroutines that could lead to a memory leak in kubelet. Note that systemd cgroup v1 is being used.

lifubang · 2023-09-09T01:39:41Z

@dsxing Please signed off your commit first.

git rebase HEAD~1 --signoff
git push -f

next time, please use git commit -s to commit your changes.

Signed-off-by: dsxing <[email protected]>

dsxing · 2023-09-09T05:52:43Z

@dsxing Please signed off your commit first.
git rebase HEAD~1 --signoff
git push -f
next time, please use git commit -s to commit your changes.

ok, it's done.

kolyshkin

My concerns (or, maybe, questions) here are:

How does two 30-second timeouts interact (as they are to be hit at about the same time)?
How does a newly added context interacts with retryOnDisconnect?
Why do we need two timeouts? Maybe it's better to replace the old one with the one from context?

Without a good understanding of these topics I'm hesitant to approve this.

Signed-off-by: dsxing <[email protected]>

dsxing · 2023-10-24T03:33:11Z

My concerns (or, maybe, questions) here are:

How does two 30-second timeouts interact (as they are to be hit at about the same time)?

How does a newly added context interacts with retryOnDisconnect?

Why do we need two timeouts? Maybe it's better to replace the old one with the one from context?

Without a good understanding of these topics I'm hesitant to approve this.

I also think we should use the same timeout parameter. Additionally, I understand that the role of retryOnDisconnect is to reconnect when the DBus connection is lost, and it's not directly related to the results of the function call, which are reported back in the 'err' parameter.

When err is nil, the final determination of the call's result will be made through statusChan.

AkihiroSuda approved these changes Sep 5, 2023

View reviewed changes

AkihiroSuda added the area/systemd label Sep 5, 2023

context with timeout

1486a45

Signed-off-by: dsxing <[email protected]>

dsxing force-pushed the main branch from 48d44d3 to 1486a45 Compare September 9, 2023 05:40

dsxing requested a review from AkihiroSuda September 24, 2023 08:36

kolyshkin requested changes Sep 29, 2023

View reviewed changes

dsxing added 2 commits October 10, 2023 10:52

use the same timeout arg

4ca587f

Signed-off-by: dsxing <[email protected]>

return err back

4e4d98c

Signed-off-by: dsxing <[email protected]>

dsxing requested a review from kolyshkin October 24, 2023 03:33

dsxing marked this pull request as draft November 24, 2023 09:55

woehrl01 mentioned this pull request Mar 22, 2024

Timeout waiting for systemd to create kubepods-burstable-***.slice kubernetes/kubernetes#120103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context with timeout #4007

context with timeout #4007

dsxing commented Sep 5, 2023

kolyshkin commented Sep 6, 2023

lifubang commented Sep 6, 2023

dsxing commented Sep 7, 2023

dsxing commented Sep 7, 2023

lifubang commented Sep 9, 2023

dsxing commented Sep 9, 2023

kolyshkin left a comment •

edited

Loading

dsxing commented Oct 24, 2023

context with timeout #4007

Are you sure you want to change the base?

context with timeout #4007

Conversation

dsxing commented Sep 5, 2023

kolyshkin commented Sep 6, 2023

lifubang commented Sep 6, 2023

dsxing commented Sep 7, 2023

dsxing commented Sep 7, 2023

lifubang commented Sep 9, 2023

dsxing commented Sep 9, 2023

kolyshkin left a comment • edited Loading

Choose a reason for hiding this comment

dsxing commented Oct 24, 2023

kolyshkin left a comment •

edited

Loading