Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataStorm: Fixes for Session Establishment and Reconnection #3333

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pepone
Copy link
Member

@pepone pepone commented Jan 10, 2025

This commit addresses two issues in the session establishment and reconnection process:

  • Session Reconnection Failure

A bug prevented proper recovery from connection failures. Specifically, when a publisher session sent the confirmCreateSession request via a forwarder using a new connection (different from the one used to receive the createSession request), the NodeForwarder dispatching the request was unaware of the subscriber session. If the publisher’s connection to the forwarder was closed before the subscriber sent any additional messages, the forwarder could not notify the subscriber of the publisher’s disconnection.

As a result, if the publisher attempted to reconnect, the subscriber would ignore the request, thinking the session was still active.

  • Incorrect Use of Forwarding Proxies for Disconnect Requests

Disconnect requests from the forwarder were sent using a forwarding proxy, which could cause unexpected behavior, such as requests being dispatched in the wrong order by the target node. Forwarding proxies are meant to be used by nodes connected to the forwarder, not by the forwarder itself.

This caused issues where confirmCreateSession was sent before disconnected, even if the forwarder invoked them in the correct reverse order. This happened because disconnected went through the forwarder, while confirmCreateSession was sent directly.

Replaces #3310 fixes #3309

This commit addresses two issues in the session establishment and reconnection process:

- **Session Reconnection Failure**

A bug prevented proper recovery from connection failures. Specifically, when a publisher
session sent the `confirmCreateSession` request via a forwarder using a new connection
(different from the one used to receive the `createSession` request), the `NodeForwarder`
dispatching the request was unaware of the subscriber session. If the publisher’s
connection to the forwarder was closed before the subscriber sent any additional messages,
the forwarder could not notify the subscriber of the publisher’s disconnection.

As a result, if the publisher attempted to reconnect, the subscriber would ignore the
request, thinking the session was still active.

- **Incorrect Use of Forwarding Proxies for Disconnect Requests**

Disconnect requests from the forwarder were sent using a forwarding proxy, which could
cause unexpected behavior, such as requests being dispatched in the wrong order by the
target node. Forwarding proxies are meant to be used by nodes connected to the forwarder,
not by the forwarder itself.

This caused issues where `confirmCreateSession` was sent before `disconnected`, even if
the forwarder invoked them in the correct reverse order. This happened because
`disconnected` went through the forwarder, while `confirmCreateSession` was sent directly.
cpp/src/DataStorm/Contract.ice Outdated Show resolved Hide resolved
/// @param session The subscriber session being created. This proxy is never null.
/// @param fromRelay Indicates whether the session is being created from a relay node.
/// @param subscriberIsHostedOnRelay Specifies if the relay is hosting a forwarder for the subscriber. If the
/// subscriber has endpoints or an adapter ID, the relay does not host a forwarder, and the publisher is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really saying "is not a well known proxy"? Can we just look at the proxy to figure this out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is indeed confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Can we just look at the proxy to figure this out?"

No, the forwarder replaces the well-known proxy with a proxy that has the forwarder endpoints. And the forwarder will forward request to it over its connection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this

subscriberIsHostedOnRelay true when fromRelay is true and the proxy passed to the relay is a well-known
proxy. In this case the relay replaces the well-known proxy with a direct proxy to the subscriber forwarder hosted by the relay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe rename it to subscriberIsWellKnown too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation doc-comment talks about subscriber node, publisher node, relay and then about 'forwarder' in the doc-comment for this parameter. It's not clear what this forwarder is.

/// @param fromRelay Indicates whether the session is being created from a relay node.

Meaning, it's not a subscriber node that sends this request directly to the publisher node, but it's going through an intermediary node - a relay node?

subscriberIsHostedOnRelay true when fromRelay is true and the proxy passed to the relay is a well-known
proxy.

Not clear which proxy you're talking about here - the subscriber proxy?

In this case the relay replaces the well-known proxy with a direct proxy to the subscriber forwarder hosted by the relay.

So if the subscriber proxy is well-known ... but then why do we need this parameter? The servant can figure out if the subscriber proxy it receives is well-known.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear which proxy you're talking about here - the subscriber proxy?

Both the NodePrx and the SessionPrx proxy.

So if the subscriber proxy is well-known ... but then why do we need this parameter?

When the intermediary node receives a well-know proxy, it replaces it with a direct proxy before forwarding the request.

The publisher receiving the forwarded request, must use a fixed proxy for sending the confirmCreateSession request. This ensures the confirmation doesn't establish a new connection with the forwarder endpoint.

The servant can figure out if the subscriber proxy it receives is well-known.

I guess one option is to look a the poxy category which is either "sf" or "s". The "sf" category is used for subscriber forwarders, and "s" for subscribers.

cpp/src/DataStorm/NodeSessionManager.cpp Outdated Show resolved Hide resolved
cpp/src/DataStorm/NodeSessionManager.cpp Outdated Show resolved Hide resolved
/// reader.
/// - The publisher node has previously send a initiateCreateSession request.
///
/// The publisher node dispatching this request would send a confirmCreateSession request to the subscriber node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write it as:
The publisher node dispatching this request sends a confirmCreationSession ... (or maybe "then sends", but no would).

/// @param session The subscriber session being created. This proxy is never null.
/// @param fromRelay Indicates whether the session is being created from a relay node.
/// @param subscriberIsHostedOnRelay Specifies if the relay is hosting a forwarder for the subscriber. If the
/// subscriber has endpoints or an adapter ID, the relay does not host a forwarder, and the publisher is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is indeed confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataStorm/reliability test hang
3 participants