-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACK lost when GSO enabled #11198
Comments
cc @kevinGC |
The gVisor sandbox appears to hang when the error occurs. When executing curl with a large payload it blocks and is it impossible to execute any other command that uses the network (commands that do not access the network work fine). I attach a dump of the goroutines while blocked. runsc version release-20241118.0-15-gb15656de596e |
Thanks for the panic + repro. I'll take a look. |
@kevinGC any recommendation on where to start looking to try to debug it myself? |
I think you want to find where those ACKs are getting lost, so I'd recommend:
The printf debugging is manual but usually takes only a few iterations. |
@GerardGarcia could you show output of |
of course
|
@kevinGC should I look for something in particular in the
|
Description
It appears to be that ACKs are not processed by gVisor netstack when the packet is big enough to be fragmented somewhere down the network stack. This causes the TCP connection to misbehave due to the client retransmitting and the server sending duplicate ACKs. If GSO (
--gso=false
) or the whole gVisor network stack is disabled (--network=host
) the connection works as expected. I attach a few network dumps:At the gVisor sandboxed container veth:
At gVisor (
--pcap-log
)Outside the gVisor sandboxed container veth:
My interpretation is that ACKs at packets 11/12 are not seen by netstack which causes the retransmissions and ACK duplicates.
Steps to reproduce
In our environment is straightforward to replicate, just send a request with a large payload with, for example, curl:
curl -XPOST http://httpbin.org/post -d @req_large.json
If the request is smaller (payload less than 1420B) everything works as expected
runsc version
docker version (if using docker)
uname
Linux (...) 5.15.166-111.163.amzn2.x86_64 #1 SMP Fri Sep 6 21:31:40 UTC 2024 x86_64 GNU/Linux
kubectl (if using Kubernetes)
We are running gVisor sandboxes within a pod not using gVisor sandboxes to wrap k8s pods
repo state (if built from source)
No response
runsc debug logs (if available)
The text was updated successfully, but these errors were encountered: