-
-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential race condition in connect with --proxy-via
parameter
#3781
Comments
@kbatalin As far as I can tell, the
I'm not saying that there isn't a race. Just pointing out that I think your conclusions are a bit off. |
Thanks for the detailed explanation. Indeed, I was mistaken in thinking that Context:
Observations:
State at this moment:
I think, the connection may succeed in certain scenarios, for example:
Sorry for the confusion in my original message. |
@kbatalin yes, this makes sense. I'll look into improving the logic. |
@kbatalin please try this preview of the coming 2.21.3 release. |
I tried to connect multiple times with different workloads, and it looks good. All attempts were successful, and in the logs I can see:
Thanks! |
Describe the bug
A potential race condition occurs when connecting to a cluster with the
--proxy-via
flag. The issue is reproducible on clusters with more than one workload.Symptoms:
When running the command
telepresence connect --proxy-via <cidr>=<workload>
, the following error appears:Logs from the root daemon:
Debugging locally, I found that the issue originates in the
agentPods
goroutine, which depends on theProxyVia
value from the--proxy-via
flag:telepresence/pkg/client/agentpf/clients.go
Lines 523 to 525 in bece8ba
If this value is not set, the goroutine skips adding the required workload to the clients list and instead adds a random workload at the end:
telepresence/pkg/client/agentpf/clients.go
Lines 539 to 547 in bece8ba
Meanwhile, the
ProxyVia
value is assigned in the separatevif
goroutine:telepresence/pkg/client/rootd/session.go
Line 1247 in 69c3021
There is no synchronization between these two goroutines. In practice, this causes:
agentPods
loads the list of pods, but sinceProxyVia
is not yet set, the goroutine skips adding the correct workload toclients
.vif
goroutine then searches for the required workload in theclients
list (in theWaitForWorkload
function) but exits immediately because the workload is not present.To Reproduce
telepresence connect --proxy-via cidr=workload
.Maybe need to try multiple times
Expected behavior
The connection should succeed, with the correct workload being added to the clients list and processed as expected.
Versions (please complete the following information):
telepresence version
(preferably while telepresence is connected):telepresence
commandsAdditional context
Proposed solutions:
WaitForWorkload
function to wait for the required workload to appear in theclients
list instead of exiting immediately.agentPods
andvif
goroutines to ensure theProxyVia
value is set beforeagentPods
runs.The text was updated successfully, but these errors were encountered: