-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass extensions to node if extension processing is handled by it #52
Conversation
One of the test fails in few of the runs.
Taking a look. |
Regarding the flaky test. There is a slight behavior change with this PR. A successful PING/PONG is required as a sender for a given node to be marked as extensions_supported and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). (Added it to the top level comment as well) |
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
So this happens in two flows:
So we really just need a goos solution for 1. |
@zuiderkwast Unsure about what you meant by assigning it to Ping for this PR, did you meant it for review or something else ? |
@hpatro please continue with this PR. I will be the "designated" reviewer :-) |
Yeah, I meant as the reviewer. Sorry. |
Question: Shall we include this in our 7.2.4 release? Although it improves compatibility with older Redis versions, it's a difference compared to Redis 7.2.4. We should mark this clearly in our release notes then. |
I think we should try our best to make it happen. It is an important and fully compatible fix. |
Reminds me of the fact we should add code owner when we get this all sorted. |
Yeah, I was pushing for this because I think it's important for compatibility. I would be okay with this in like a 7.2.5 though, it's important to have eventually. |
Signed-off-by: Harkrishn Patro <[email protected]>
Changes done:
@PingXie @madolson Let me know if you agree with this. @madolson Regarding your concern of displaying IP address incorrectly for a period of time, I think we should decouple it from this issue. And I think that issue was valid in 7.0 itself. Can discuss further. |
Signed-off-by: Harkrishn Patro <[email protected]>
@PingXie Thanks for your feedback. I initially started off this PR (in Redis) with similar setup you suggested (fixing the issue and manually tested it). The fake testing approach seems to bring in a bit of complexity and albeit it wasn't perfect (tried to get it out in a hurry). I've trimmed down the code, please have a look now. |
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Tested few things Setup
Operations
|
Signed-off-by: Harkrishn Patro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot cleaner and maintainable to me. Thanks a lot for simplifying the fix, @hpatro!
The server code LGTM but I think there is room for improvement in the test code. Can you please take another look?
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Why do you think this? I think this is a regression if you have hostnames enabled and have a new node join a cluster, as for a brief period of time the nodes will incorrect show up as not support hostnames (even though they do). We've seen issues in the past where clients don't handle this correctly, and some clients see prolonged periods without being able to resolve it. My concern is that it affects new nodes added during steady state, and not just the upgrade case. |
Exactly, this issue exists for a new node being added and not related to the upgrade scenario. So, it could happen for any cluster running with hostnames feature. |
Today, the hostname feature blindly sends the extension when it joins a new cluster, assuming it supports it. Now, with this change, we will wait until we have confirmation it supports the hostname. This is different behavior no? |
Won't that node be added via gossip as well and in the gossip information, hostname won't be present. Am I missing something? 🙈 |
That was one of the two cases we discussed, but the case I'm talking about is for meet messages. The first message a node sends to a new cluster will not include its hostname. (Maybe I'm indexing too much on this case though). |
Why is this a problem? The cluster bus is an eventual consistency system, and the information broadcast via the cluster bus including "hostname" is inherently dynamic/non-constant. so conceptually speaking this behavior change is within the design premise. |
Although generally this is true, in practice today if you set all the hostnames at startup a node will never observe a node that doesn't have a hostname. This change introduces a small window where that isn't true, and it could show it to clients and a user might see errors. I suppose I need to be prescriptive. It seems like everyone wants something simple, so I can lean into that. I would rather hostnames continue to work as they do, with nodes not checking to see if the other node supports it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides my comment I was OK with the design, so would rather have this in the RC candidate. Folks can comment.
…key-io#52) Ref: redis/redis#12760 enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix introduces a flag `extensions_supported` on the clusterMsg indicating the sender node can handle extensions parsing. Once, a receiver nodes receives a message with this flag set to 1, it would update clusterNode new field extensions_supported and start sending out extensions if it has any. This PR also introduces a DEBUG sub command to enable/disable cluster message extensions `process-clustermsg-extensions` feature. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `extensions_supported` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). TCL test verifying the cluster state is healthy irrespective of enabling/disabling cluster message extensions feature. --------- Signed-off-by: Harkrishn Patro <[email protected]>
Will file an issue explaining the scenarios better and we can decide over there. |
First and foremost, I am not sure about the importance/value of this property of "a node will never observe a node that doesn't have a hostname". This already can happen during rolling upgrades from pre 7.0 builds to 7.2 and all nodes will have to handle it. Furthermore, the propagation of many other node attributes don't have this property either today. Even within the same node, the cluster and the replication modules might have a different view on the "primary-replica" relationship too (for a very short period of time after My point is that, with the current cluster design, the operator would have to perform an explicit synchronization regardless (think of |
The important point here is a node can't have a hostname pre 7.0, it was an opt-in feature. |
…key-io#52) Ref: redis/redis#12760 ### Description #### Fixes compatibilty of PlaceholderKV cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix introduces a flag `extensions_supported` on the clusterMsg indicating the sender node can handle extensions parsing. Once, a receiver nodes receives a message with this flag set to 1, it would update clusterNode new field extensions_supported and start sending out extensions if it has any. This PR also introduces a DEBUG sub command to enable/disable cluster message extensions `process-clustermsg-extensions` feature. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `extensions_supported` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). ### Testing TCL test verifying the cluster state is healthy irrespective of enabling/disabling cluster message extensions feature. --------- Signed-off-by: Harkrishn Patro <[email protected]>
) This PR is based on the commits from PR valkey-io/valkey#52. Ref: #12760 Close #13401 This PR will replace #13449 Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix does the following things: 1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has. 2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]> --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]>
…is#13465) This PR is based on the commits from PR valkey-io/valkey#52. Ref: redis#12760 Close redis#13401 This PR will replace redis#13449 Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix does the following things: 1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has. 2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]> --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]>
) This PR is based on the commits from PR valkey-io/valkey#52. Ref: #12760 Close #13401 This PR will replace #13449 Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix does the following things: 1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has. 2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]> --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]>
) This PR is based on the commits from PR valkey-io/valkey#52. Ref: #12760 Close #13401 This PR will replace #13449 Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix does the following things: 1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has. 2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]> --------- Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]>
Ref: redis/redis#12760
Description
Fixes compatibilty of PlaceholderKV cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) .
With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message.
Note: A successful
PING
/PONG
is required as a sender for a given node to be marked asextensions_supported
and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s).Testing
TCL test verifying the cluster state is healthy irrespective of enabling/disabling cluster message extensions feature.