Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eventual consistency of hostname information in Valkey Cluster #304

Open
hpatro opened this issue Apr 11, 2024 · 6 comments
Open

Eventual consistency of hostname information in Valkey Cluster #304

hpatro opened this issue Apr 11, 2024 · 6 comments

Comments

@hpatro
Copy link
Contributor

hpatro commented Apr 11, 2024

Problem

Hostnames were introduced in Redis 7.0 and eventual consistency of hostnames in cluster mode via extensions could show ip address for nodes already supporting hostnames and client would need to handle the behavior correctly. I've outlined the scenario due to which the hostname information can be stale and requires few roundtrip(s) between nodes to have the complete information.

Scenario 1: Behavior prior to #52:

image

Scenario 2: Current behavior on unstable:

image

With both the scenario(s) explained above, the hostname information propagation throughout the cluster happens eventually. However, with the current behavior, the CLUSTER SLOTS/SHARDS would display the node without hostnames whereas it was only displayed in CLUSTER NODES.

Possible Solution(s)

If we were to show nodes with hostnames information correctly on all the API(s), I think they are two alternatives:

  1. If a node has hostname associated to it, send hostnames as part of extensions. With this change, CLUSTER NODES output will still be wrong as explained in scenario 1.
  2. If a node has hostname associated to it, filter out nodes without hostname from CLUSTER NODES/SLOTS/SHARDS response.
  3. Continue as is and let the client eventually determine the correct endpoint to connect to.

I'm inclined towards 2, it handles the eventual consistency of the system well however it doesn't showcase nodes to the clients until we have received complete information about a node.

@hpatro
Copy link
Contributor Author

hpatro commented Apr 11, 2024

@madolson @PingXie @srgsanky Feel free to add your thoughts.

@PingXie
Copy link
Member

PingXie commented Apr 11, 2024

Before we start discussing the solutions, can you please help articulate the value of having this "property" or the impact of lacking it? see my comment

@hpatro
Copy link
Contributor Author

hpatro commented Apr 11, 2024

Before we start discussing the solutions, can you please help articulate the value of having this "property" or the impact of lacking it? see my comment

Regarding #52 (comment)

First and foremost, I am not sure about the importance/value of this property of "a node will never observe a node that doesn't have a hostname". This already can happen during rolling upgrades from pre 7.0 builds to 7.2 and all nodes will have to handle it.

Hostname feature won't be enabled until all the nodes in the cluster have been upgraded (This was a choice made during the feature launch). If the hostname feature would have been enabled during the rolling upgrade(s), it would have failed due to the extensions not supported issue which we fixed recently.

Furthermore, the propagation of many other node attributes don't have this property either today. Even within the same node, the cluster and the replication modules might have a different view on the "primary-replica" relationship too (for a very short period of time after cluster replicate).

I agree with this point, It's an eventual consistent design system and don't have appropriate system to guarantee consistent view across the cluster. However, in this particular case IIUC, it can cause issue on the client side on topology update and node connection. @madolson can highlight more.

My point is that, with the current cluster design, the operator would have to perform an explicit synchronization regardless (think of wait_for_cluster_propagation). So a one-off effort to provide this property (of "a node will never observe a node that doesn't have a hostname") is not going to change the landscape much but with the downside of the excessive complexity (to a part of the system where we already are grappling with reliability issues).

Hence, I've option 3 as well in my proposal 😉 . I wanted to document the scenario(s) for better understanding and for all of us to be on the same page. We can converge on a solution here (if any).

@srgsanky
Copy link
Contributor

I am curious why advertising IP and switching to hostname is bad for a client. For a TLS cluster, will the client fail to establish connection if they see an IP?

@hpatro
Copy link
Contributor Author

hpatro commented Apr 17, 2024

I am curious why advertising IP and switching to hostname is bad for a client. For a TLS cluster, will the client fail to establish connection if they see an IP?

The other issue which I've heard of is clients might maintain duplicate entries for a single node by the endpoint and will need to reconcile the information at later point.

@madolson
Copy link
Member

I am curious why advertising IP and switching to hostname is bad for a client. For a TLS cluster, will the client fail to establish connection if they see an IP?

Hostname verification is one issue. The other is that clients often use the endpoint:port as the key to map into the node, so they often have poor logic for handling that rename, since it's not a historic issue. The OSS project has very little testing or validation of these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants