-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(relay): don't close connections upon errors in relay server #4718
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work.
Can you add a test that ensures #4752 is fixed with this pull request, i.e. that a connection is not closed even though the remote does not support the relay protocol as one expected?
To properly test this, I think we need to also merge the fix for the client side (#4745) otherwise, the client will simply close the connection. Happy to add a test once both PRs are landed. |
14ee70d
to
34c8f71
Compare
This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏 |
To make a reservation with a relay, a user calls `Swarm::listen_on` with an address of the relay, suffixed with a `/p2pcircuit` protocol. Similarly, to establish a circuit to another peer, a user needs to call `Swarm::dial` with such an address. Upon success, the `Swarm` then issues a `SwarmEvent::NewListenAddr` event in case of a successful reservation or a `SwarmEvent::ConnectionEstablished` in case of a successful connect. The story is different for errors. Somewhat counterintuitively, the actual reason of an error during these operations are only reported as `relay::Event`s without a direct correlation to the user's `Swarm::listen_on` or `Swarm::dial` calls. With this PR, we send these errors back "into" the `Transport` and report them as `SwarmEvent::ListenerClosed` or `SwarmEvent::OutgoingConnectionError`. This is conceptually more correct. Additionally, by sending these errors back to the transport, we no longer use `ConnectionHandlerEvent::Close` which entirely closes the underlying relay connection. In case the connection is not used for something else, it will be closed by the keep-alive algorithm. Resolves: #4717. Related: #3591. Related: #4718. Pull-Request: #4745.
This PR implements the long-awaited design of disallowing `ConnectionHandler`s to close entire connections. Instead, users should close connections via `ToSwarm::CloseConnection` from a `NetworkBehaviour` or - even better - from the `Swarm` via `close_connection`. A `NetworkBehaviour` also does not have a "full" view onto how a connection is used but at least it can correlate whether it created the connection via the `ConnectionId`. In general, the more modular and friendly approach is to stop "using" a connection if a particular protocol no longer needs it. As a result of the keep-alive algorithm, such a connection is then closed automatically. Depends-on: #4745. Depends-on: #4718. Depends-on: #4749. Related: #3353. Related: #4714. Resolves: #3591. Pull-Request: #4755.
Description
To remove the usages of
ConnectionHandlerEvent::Close
from the relay-server, we unify what used to be calledCircuitFailedReason
andFatalUpgradeError
. Whilst the errors may be fatal for the particular circuit, they are not necessarily fatal for the entire connection.Related: #3591.
Resolves: #4716.
Notes & open questions
Should we do some kind of "smart" connection management upon failures on the streams further up? At the moment, we don't expose the details of which connection a stream failed on. I am leaning towards saying "no" here and instead relying more on fix(swarm): keep connections alive while active streams exist #4595. Once we do more automated keep-alive tracking, bad connections will close automatically much more aggressively. That is because any error on a stream will lead to the user dropping the stream which means we will automatically returnKeepAlive::No
.Change checklist