Eventual consistency of hostname information in Valkey Cluster #304

hpatro · 2024-04-11T22:22:16Z

Problem

Hostnames were introduced in Redis 7.0 and eventual consistency of hostnames in cluster mode via extensions could show ip address for nodes already supporting hostnames and client would need to handle the behavior correctly. I've outlined the scenario due to which the hostname information can be stale and requires few roundtrip(s) between nodes to have the complete information.

Scenario 1: Behavior prior to #52:

Scenario 2: Current behavior on unstable:

With both the scenario(s) explained above, the hostname information propagation throughout the cluster happens eventually. However, with the current behavior, the CLUSTER SLOTS/SHARDS would display the node without hostnames whereas it was only displayed in CLUSTER NODES.

Possible Solution(s)

If we were to show nodes with hostnames information correctly on all the API(s), I think they are two alternatives:

If a node has hostname associated to it, send hostnames as part of extensions. With this change, CLUSTER NODES output will still be wrong as explained in scenario 1.
If a node has hostname associated to it, filter out nodes without hostname from CLUSTER NODES/SLOTS/SHARDS response.
Continue as is and let the client eventually determine the correct endpoint to connect to.

I'm inclined towards 2, it handles the eventual consistency of the system well however it doesn't showcase nodes to the clients until we have received complete information about a node.

The text was updated successfully, but these errors were encountered:

hpatro · 2024-04-11T22:23:55Z

@madolson @PingXie @srgsanky Feel free to add your thoughts.

PingXie · 2024-04-11T22:39:26Z

Before we start discussing the solutions, can you please help articulate the value of having this "property" or the impact of lacking it? see my comment

hpatro · 2024-04-11T22:52:11Z

Before we start discussing the solutions, can you please help articulate the value of having this "property" or the impact of lacking it? see my comment

Regarding #52 (comment)

First and foremost, I am not sure about the importance/value of this property of "a node will never observe a node that doesn't have a hostname". This already can happen during rolling upgrades from pre 7.0 builds to 7.2 and all nodes will have to handle it.

Hostname feature won't be enabled until all the nodes in the cluster have been upgraded (This was a choice made during the feature launch). If the hostname feature would have been enabled during the rolling upgrade(s), it would have failed due to the extensions not supported issue which we fixed recently.

Furthermore, the propagation of many other node attributes don't have this property either today. Even within the same node, the cluster and the replication modules might have a different view on the "primary-replica" relationship too (for a very short period of time after cluster replicate).

I agree with this point, It's an eventual consistent design system and don't have appropriate system to guarantee consistent view across the cluster. However, in this particular case IIUC, it can cause issue on the client side on topology update and node connection. @madolson can highlight more.

My point is that, with the current cluster design, the operator would have to perform an explicit synchronization regardless (think of wait_for_cluster_propagation). So a one-off effort to provide this property (of "a node will never observe a node that doesn't have a hostname") is not going to change the landscape much but with the downside of the excessive complexity (to a part of the system where we already are grappling with reliability issues).

Hence, I've option 3 as well in my proposal 😉 . I wanted to document the scenario(s) for better understanding and for all of us to be on the same page. We can converge on a solution here (if any).

srgsanky · 2024-04-12T03:16:11Z

I am curious why advertising IP and switching to hostname is bad for a client. For a TLS cluster, will the client fail to establish connection if they see an IP?

hpatro · 2024-04-17T18:22:32Z

I am curious why advertising IP and switching to hostname is bad for a client. For a TLS cluster, will the client fail to establish connection if they see an IP?

The other issue which I've heard of is clients might maintain duplicate entries for a single node by the endpoint and will need to reconcile the information at later point.

madolson · 2024-04-18T17:18:19Z

I am curious why advertising IP and switching to hostname is bad for a client. For a TLS cluster, will the client fail to establish connection if they see an IP?

Hostname verification is one issue. The other is that clients often use the endpoint:port as the key to map into the node, so they often have poor logic for handling that rename, since it's not a historic issue. The OSS project has very little testing or validation of these cases.

daniel-house mentioned this issue Apr 16, 2024

Should functions be synchronized among cluster nodes? #58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eventual consistency of hostname information in Valkey Cluster #304

Eventual consistency of hostname information in Valkey Cluster #304

hpatro commented Apr 11, 2024 •

edited

Loading

hpatro commented Apr 11, 2024

PingXie commented Apr 11, 2024

hpatro commented Apr 11, 2024 •

edited

Loading

srgsanky commented Apr 12, 2024

hpatro commented Apr 17, 2024

madolson commented Apr 18, 2024

Eventual consistency of hostname information in Valkey Cluster #304

Eventual consistency of hostname information in Valkey Cluster #304

Comments

hpatro commented Apr 11, 2024 • edited Loading

Problem

Scenario 1: Behavior prior to #52:

Scenario 2: Current behavior on unstable:

Possible Solution(s)

hpatro commented Apr 11, 2024

PingXie commented Apr 11, 2024

hpatro commented Apr 11, 2024 • edited Loading

srgsanky commented Apr 12, 2024

hpatro commented Apr 17, 2024

madolson commented Apr 18, 2024

hpatro commented Apr 11, 2024 •

edited

Loading

hpatro commented Apr 11, 2024 •

edited

Loading