Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clientside WebSocket keepalive and the inevitability of a new protocol layer #127

Open
noahlevenson opened this issue Mar 15, 2023 · 4 comments

Comments

@noahlevenson
Copy link
Contributor

The challenge of WebSocket keepalive has yet again illustrated why a new protocol layer to implement Broflake concepts seems inevitable. It's worth discussing the background:

To reduce latency for censored end users, uncensored clients should be able to open WebSocket connections to the egress server long before they know they have any bytes requiring transportation. This means that uncensored clients may create yet-unused WebSocket connections which appear to middleboxes as idle. We observe middleboxes closing these connections. This results in discon/recon loops, as uncensored clients create new WebSocket connections, detect their closure, and reconnect, oscillating every 60 seconds or so.

This is easily mitigated with a WebSocket keepalive. The built-in WebSocket ping/pong is the desirable way to accomplish this. It is clearly desirable to implement ping on the clientside, so as to distribute the work of keepalive to connected clients rather than centralizing the work at the egress server.

However, browser clients do not support WebSocket ping, since it's not part of the JavaScript API. This leaves us with several possible solutions:

  1. Ping from the server instead of the client
  2. Try to send an unnecessary QUIC frame or some other garbage over the WebSocket link as a keepalive
  3. Roll our own ping/pong protocol
  4. Do not allow pre-allocation of WebSocket connections, at the cost of increased latency for censored end users
  5. Do nothing, and tolerate the discon/recon loops

We have currently opted for solution number 1, since we already had sufficiently low-level access to WebSocket reads and writes so as to implement a relatively optimized serverside keepalive in just a few LOC.

But for optimal scalability, we really ought to move this logic to the client. This means rolling our own ping/pong protocol.

Rolling our own ping/pong protocol means introducing Broflake control frames and a Broflake header, which requires a new protocol layer between WebSocket and QUIC that must be demuxed at the egress server. This layer is also where we'd implement the Broflake handshake (for version compatibility enforcement and future extensibility), and it's where we'd implement a solution for the deferred problem of backrouting in a multi-hop network.

See also:

https://github.com/getlantern/product/issues/37

#16

@noahlevenson noahlevenson mentioned this issue Oct 27, 2023
@myleshorton
Copy link
Contributor

But for optimal scalability, we really ought to move this logic to the client.

What would make that more scalable?

@noahlevenson
Copy link
Contributor Author

@myleshorton With the logic in the server, the server is responsible for maintaining state and sending network requests for N keepalives. N grows with the number of connected clients. Implementation-wise, it's just a little timeout check on last received data that's associated with each connected client. But distributing that logic to the clients would remove the burden from the server entirely.

@myleshorton
Copy link
Contributor

Got it. Is that actually measured to be a performance bottleneck though?

@noahlevenson
Copy link
Contributor Author

Nah, it's a relatively minor scalability concern compared to the other scalability concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants