Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receiving ping while closing connection causing ProtocolError #139

Closed
HoneyryderChuck opened this issue Aug 17, 2018 · 5 comments · Fixed by #140
Closed

receiving ping while closing connection causing ProtocolError #139

HoneyryderChuck opened this issue Aug 17, 2018 · 5 comments · Fixed by #140

Comments

@HoneyryderChuck
Copy link
Collaborator

I've ran into the case where my connection receives a ping frame just after I call goaway on it. More or less this sequence:

1: {:length=>0, :type=>:data, :flags=>[:end_stream], :stream=>1, :payload=>0}
1: <- DATA: 0 bytes...
1: <- ""
# here I start cleaning up, as I won't need the connection anymore
: frame was buffered!
: {:type=>:goaway, :last_stream=>0, :error=>:no_error, :payload=>nil}
1: closing stream
READ: 17 bytes...
0: frame was received!
0: {:length=>8, :type=>:ping, :flags=>[], :stream=>0, :payload=>"\x00\x00\x00\x00\x00\x00\x00\x00"}
ouch: HTTP2::Error::ProtocolError

this happens due to this code here (after calling goaway on a connection, @state is marked as :closed).

I was looking into the RFC, and couldn't figure out if this was a strict check besides An endpoint MUST NOT send frames other than PRIORITY on a closed stream., which in my case the sender doesn't know, as the goaway frame has only been buffered, not sent yet. On the other hand, this only happens because I'm draining the socket and reacting on every frame, instead of draining socket then doing stuff like closing the connection. Is this something that the library could potentially figure out?

@igrigorik
Copy link
Owner

Hmm, I think this is an implementation decision that we need to revisit on our end. Specifically, the problem here is that we transition to closed immediately, but this doesn't account for the race condition of the receiver having frames that are already in flight.

Relevant spec section: https://tools.ietf.org/html/rfc7540#section-6.8

A server that is attempting to gracefully shut down a
connection SHOULD send an initial GOAWAY frame with the last stream
identifier set to 2^31-1 and a NO_ERROR code. This signals to the
client that a shutdown is imminent and that initiating further
requests is prohibited. After allowing time for any in-flight stream
creation (at least one round-trip time), the server can send another
GOAWAY frame with an updated last stream identifier. This ensures
that a connection can be cleanly shut down without losing requests.

After sending a GOAWAY frame, the sender can discard frames for
streams initiated by the receiver with identifiers higher than the
identified last stream. However, any frames that alter connection
state cannot be completely ignored. For instance, HEADERS,
PUSH_PROMISE, and CONTINUATION frames MUST be minimally processed to
ensure the state maintained for header compression is consistent (see
Section 4.3); similarly, DATA frames MUST be counted toward the
connection flow-control window. Failure to process these frames can
cause flow control or header compression state to become
unsynchronized.

A similar instance of same problem is in handling (recently) closed streams:
https://github.com/igrigorik/http-2/blob/master/lib/http/2/connection.rb#L668-L678

@HoneyryderChuck
Copy link
Collaborator Author

I see. What do you suggest as the best approach then? Should one ignore all frames after a connection has been closed, or is there anything one needs to account for?

@igrigorik
Copy link
Owner

I think the answer lies in the second paragraph I quoted. We shouldn't accept frames without raising an error indefinitely. We need to allow a small window (similar to what we did for Connection) to ensure that inflight frames (due to the race condition) can settle on both ends.

@HoneyryderChuck
Copy link
Collaborator Author

I see. So the "15 second timeout" is not something enforced by the spec, rather a mitigation for this type of issue?

@igrigorik
Copy link
Owner

No, that's an arbitrary timeout we picked for our implementation. The spec points out that this is a problem implementers should consider, but defers the implementation details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants