Only (re-)send MEET packet once every handshake timeout period #1441

pieturin · 2024-12-13T23:38:29Z

Add meet_sent field in clusterNode indicating the last time we sent a MEET packet. Use this field to only (re-)send a MEET packet once every handshake timeout period when detecting a node without an inbound link.

When receiving multiple MEET packets on the same link while the node is in handshake state, instead of dropping the packet, we now simply prevent the creation of a new node. This way we still process the MEET packet's gossip and reply with a PONG as any other packets.

Improve some logging messages to include human_nodename. Add nodeExceedsHandshakeTimeout() function.

This is a follow-up to this previous PR: #1307
And a partial fix to the crash described in: #1436

pieturin · 2024-12-14T01:03:06Z

@enjoy-binbin, @PingXie, please take a look at this PR when you have the chance.

enjoy-binbin

LGTM, I also thought about limiting it, but I thought we should be more robust to double packets.

Add meet_sent field in clusterNode indicating the last time we sent a MEET packet. Use this field to only (re-)send a MEET packet once every handshake timeout period. Improve some logging messages to include human_nodename. Add nodeExceedsHandshakeTimeout() function. Signed-off-by: Pierre Turin <[email protected]>

Signed-off-by: Pierre Turin <[email protected]>

src/cluster_legacy.c

When receiving multiple MEET packets on the same link while the node is in handshake state, instead of dropping the packet, we now simply prevent the creation of a new node. This way we still process the MEET packet's gossip and reply with a PONG as any other packets. Signed-off-by: Pierre Turin <[email protected]>

pieturin · 2024-12-17T00:08:33Z

In the last revision of this PR, I removed the code to drop the MEET packet from #1436 and replaced it with a check to prevent from creating a new node when receiving several MEET packets on the same link while the node is in handshake.

@enjoy-binbin, @madolson, @PingXie, please tell me if you prefer this over dropping the packet entirely.

PingXie

LGTM overall.

I'd prefer not dropping the second MEET. Not because the dropping itself is a concern but the fix is a bit ad-hoc. I feel that it is just a valid case and we should just handle it.

Another comment is on the new inbound_link_freed_time field. I feel that it is not created at the right level (of abstraction). It should not be a concern of freeClusterLink. I actually like the new meet_sent more. What do you think?

src/cluster_legacy.c

enjoy-binbin · 2024-12-17T03:31:57Z

When receiving multiple MEET packets on the same link while the node is in handshake state, instead of dropping the packet, we now simply prevent the creation of a new node. This way we still process the MEET packet's gossip and reply with a PONG as any other packets.

ok, this LGTM and it was actually my first attempt, back then i mistakenly thought that repeated MEET would eliminate the handshake state, but now that I look at the code I think it is safe.

pieturin · 2024-12-17T03:35:22Z

Another comment is on the new inbound_link_freed_time field. I feel that it is not created at the right level (of abstraction). It should not be a concern of freeClusterLink. I actually like the new meet_sent more. What do you think?

I don't think we can get rid of inbound_link_freed_time and only use meet_sent. We don't want to send a MEET packet every time the inbound connection is dropped. We should wait for the timeout period to first make sure the other side hasn't tried to re-connect with us.

Why do you think it should not be a concern of freeClusterLink()? This function already updates the node's link or inbound_link, so it seems natural for it to also update inbound_link_freed_time.

Address PR comments. Signed-off-by: Pierre Turin <[email protected]>

Node 1 might not be known from Node 0 if Node 2 does not gossip about it to Node 1. Signed-off-by: Pierre Turin <[email protected]>

Signed-off-by: Pierre Turin <[email protected]>

codecov · 2024-12-17T04:18:55Z

Codecov Report

Attention: Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.87%. Comparing base (980a801) to head (a542eb5).
Report is 24 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/cluster_legacy.c	96.87%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1441      +/-   ##
============================================
+ Coverage     70.78%   70.87%   +0.08%     
============================================
  Files           119      119              
  Lines         64691    64704      +13     
============================================
+ Hits          45790    45856      +66     
+ Misses        18901    18848      -53

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`86.73% <96.87%> (-0.13%)`	⬇️

... and 27 files with indirect coverage changes

madolson · 2024-12-17T18:05:38Z

When receiving multiple MEET packets on the same link while the node is in handshake state, instead of dropping the packet, we now simply prevent the creation of a new node. This way we still process the MEET packet's gossip and reply with a PONG as any other packets.

Btw, I also agree with this. I mentioned on the other thread I think this is the better behavior, but people were seeing a lot of CI failures so I was okay with a temporary solution.

pieturin · 2024-12-17T22:20:18Z

The code coverage reports one missing line:

valkey/src/cluster_legacy.c

Line 3253 in 0ccc5cc

debugServerAssert(link->inbound && nodeInHandshake(link->node));

This line should not be reached anymore since we are waiting for the handshake timeout before re-sending a MEET packet.

PingXie · 2024-12-18T07:39:57Z

Another comment is on the new inbound_link_freed_time field. I feel that it is not created at the right level (of abstraction). It should not be a concern of freeClusterLink. I actually like the new meet_sent more. What do you think?

I don't think we can get rid of inbound_link_freed_time and only use meet_sent. We don't want to send a MEET packet every time the inbound connection is dropped. We should wait for the timeout period to first make sure the other side hasn't tried to re-connect with us.

Why do you think it should not be a concern of freeClusterLink()? This function already updates the node's link or inbound_link, so it seems natural for it to also update inbound_link_freed_time.

freeClusterLink() is invoked in various scenarios, but the new state we're introducing is only relevant in the handshake path. This suggests that the scope of the new state is broader than necessary.

On a second thought, I think a better name would be handshake_start_time. This name better reflects its purpose.

On the initiator side, handshake_start_time would be set whenever a MEET command is sent.
On the recipient side, it would be set when the first PING message is sent on the outgoing link for a node in the HANDSHAKE state.

pieturin · 2024-12-19T02:01:11Z

On a second thought, I think a better name would be handshake_start_time. This name better reflects its purpose.

I still think having this is not enough. We should still have inbound_link_freed_time to know if the link was freed recently in case it's just a temporary reconnect between the two nodes, we don't want to send a MEET packet because the handshake happened days ago, and now the inbound link is NULL for a fraction of a second. Does this make sense?
And I can't use ping_sent/pong_received/data_received since those are also updated for the outbound link.

freeClusterLink() is invoked in various scenarios, but the new state we're introducing is only relevant in the handshake path. This suggests that the scope of the new state is broader than necessary.

But we need to update inbound_link_freed_time every time the inbound link is freed in order to make sure we don't send the MEET packet on a short disconnect.

I guess a way to get rid of inbound_link_freed_time would be to replace it with another field that would track if the inbound_link was ever opened on a node, something like was_inbound_link_set.

With this, we could replace the condition with something like:

    if (nodeInNormalState(node) && node->link != NULL && node->inbound_link == NULL &&
        nodeExceedsHandshakeTimeout(node) && now - node->meet_sent > getHandshakeTimeout() &&
        !node->was_inbound_link_set) {

What do you think @PingXie?

pieturin · 2024-12-20T01:27:46Z

I had a talk offline with @PingXie, and we think we can change the condition to send a MEET packet to something like:

    if (nodeInNormalState(node) && node->link != NULL && node->inbound_link == NULL &&
        nodeExceedsHandshakeTimeout(node) &&
        now - node->handshake_start_time > getHandshakeTimeout() && now - node->handshake_start_time < getHandshakeTimeout() * 2) {

And with that we should be able to get rid of inbound_link_freed_time.

pieturin · 2024-12-20T22:25:20Z

A crash was detected recently during the daily runs: https://github.com/valkey-io/valkey/actions/runs/12403569355/job/34627291048#step:8:3854

17847:M 19 Dec 2024 00:15:58.007 - Accepting cluster node connection from 127.0.0.1:45768
17847:M 19 Dec 2024 00:15:58.021 * Error converting peer IP to string: (null)


=== VALKEY BUG REPORT START: Cut & paste starting from here ===
17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED ===
17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true

------ STACK TRACE ------

17847 valkey-server *
src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634]
src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de]
/__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e]
src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea]
src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547]
/lib64/libc.so.6(+0x40c8) [0x7f083985a0c8]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b]
src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5]

...

# Cluster info
cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:2
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_ping_sent:1
cluster_stats_messages_pong_sent:1
cluster_stats_messages_sent:2
cluster_stats_messages_meet_received:2
cluster_stats_messages_received:2
total_cluster_links_buffer_limit_exceeded:0

------ CLUSTER NODES OUTPUT ------
de11af5df0d8b71a5df786c7a1e7ab0a0beab7fc 127.0.0.1:27128@37127,,tls-port=27127,shard-id=2ea5410be1f7dcac4755a6c9bcc60558e3f4f14b handshake - 1734567357566 0 0 connected
26d986e4e07f0d2c768d62dfdaf9f0ae03d18e49 127.0.0.1:27131@27129,,tls-port=27130,shard-id=236191ce869481919bb106a49cb235538b419178 myself,master - 0 0 0 connected

My proposal after talking with @madolson is to remove this assert and instead just remove the newly created node and close the link, since if we cannot get the IP from the link it probably means the connection was closed.

valkey/src/cluster_legacy.c

Line 3274 in 1c97317

serverAssert(nodeIp2String(node->ip, link, hdr->myip) == C_OK);

pieturin · 2024-12-21T00:12:15Z

Doing the solution with handshake_start_time doesn't work with the existing tests. I see that a node is sending a MEET packet just because of a temporary disconnect during one of the cluster-reliable-meet tests. Because the other nodes frees the link and reconnects because it didn't receive any PONG for more than server.cluster_node_timeout / 2. And then the cluster-reliable-meet test fails because the node with the inconsistent view never sends the MEET (not sure why yet though). So I don't think it's worth trying to make this work. It seems more finicky than relying on inbound_link_freed_time.

When we cannot get the IP address from a link, instead of asserting we now free the link and delete the node. Signed-off-by: Pierre Turin <[email protected]>

Signed-off-by: Pierre Turin <[email protected]>

PingXie

Some minor wording changes in the code comments and logs. Thanks @pieturin!

src/cluster_legacy.c

PingXie · 2024-12-22T09:28:58Z

And then the cluster-reliable-meet test fails because the node with the inconsistent view never sends the MEET (not sure why yet though)

I think I know why. The handshake_start_time is lost in this case along with the node that is created as a response to the first MEET. The node is then recreated via the gossip route, but this new node never initiates a handshake, so its handshake_start_time is never set. Here is the possible sequence:

A cluster with two primaries, P1 and P2.
A new node N is to be added.
P1 is instructed to meet N, but P1 is also told to drop all cluster traffic from N. Therefore, from N's perspective, P1 remains in the handshake state.
Eventually, P1's representation on N, a clusterNode instance, times out and gets deleted.
However, when P1 sends out its first MEET, it also gossips about P2 to N, which shows up as a normal node on N.
P2 then gossips P1 back to N, now as a normal node! This is why handshake_start_time is never initialized for this new version of P1's representation on N.

I think this is just another argument for #1471.

In valkey-io#1441, we found a assert, and decided remove this assert and instead just free the newly created node and close the link, since if we cannot get the IP from the link it probably means the connection was closed. ``` === VALKEY BUG REPORT START: Cut & paste starting from here === 17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED === 17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true ------ STACK TRACE ------ 17847 valkey-server * src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634] src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de] /__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e] src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea] src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547] /lib64/libc.so.6(+0x40c8) [0x7f083985a0c8] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b] src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5] ``` But it also introduces another assert. The reason is that this new node is not added to the cluster nodes dict, we should just free it. ``` 17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED === 17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true ------ STACK TRACE ------ 17128 valkey-server * src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4] src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2] src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618] /__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278] src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09] src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5] src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e] ``` This closes valkey-io#1527. Signed-off-by: Binbin <[email protected]>

) In #1441, we found a assert, and decided remove this assert and instead just free the newly created node and close the link, since if we cannot get the IP from the link it probably means the connection was closed. ``` === VALKEY BUG REPORT START: Cut & paste starting from here === 17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED === 17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true ------ STACK TRACE ------ 17847 valkey-server * src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634] src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de] /__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e] src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea] src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547] /lib64/libc.so.6(+0x40c8) [0x7f083985a0c8] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b] src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5] ``` But it also introduces another assert. The reason is that this new node is not added to the cluster nodes dict. ``` 17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED === 17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true ------ STACK TRACE ------ 17128 valkey-server * src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4] src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2] src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618] /__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278] src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09] src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5] src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e] ``` This closes #1527. Signed-off-by: Binbin <[email protected]>

Update comments and log message in `cluster_legacy.c`. Follow-up from #1441. Signed-off-by: Pierre Turin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Binbin <[email protected]>

…lkey-io#1535) In valkey-io#1441, we found a assert, and decided remove this assert and instead just free the newly created node and close the link, since if we cannot get the IP from the link it probably means the connection was closed. ``` === VALKEY BUG REPORT START: Cut & paste starting from here === 17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED === 17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true ------ STACK TRACE ------ 17847 valkey-server * src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634] src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de] /__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e] src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea] src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547] /lib64/libc.so.6(+0x40c8) [0x7f083985a0c8] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b] src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5] ``` But it also introduces another assert. The reason is that this new node is not added to the cluster nodes dict. ``` 17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED === 17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true ------ STACK TRACE ------ 17128 valkey-server * src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4] src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2] src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618] /__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278] src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09] src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5] src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e] ``` This closes valkey-io#1527. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>

Update comments and log message in `cluster_legacy.c`. Follow-up from valkey-io#1441. Signed-off-by: Pierre Turin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>

…y-io#1441) Add `meet_sent` field in `clusterNode` indicating the last time we sent a MEET packet. Use this field to only (re-)send a MEET packet once every handshake timeout period when detecting a node without an inbound link. When receiving multiple MEET packets on the same link while the node is in handshake state, instead of dropping the packet, we now simply prevent the creation of a new node. This way we still process the MEET packet's gossip and reply with a PONG as any other packets. Improve some logging messages to include `human_nodename`. Add `nodeExceedsHandshakeTimeout()` function. This is a follow-up to this previous PR: valkey-io#1307 And a partial fix to the crash described in: valkey-io#1436 --------- Signed-off-by: Pierre Turin <[email protected]>

…lkey-io#1535) In valkey-io#1441, we found a assert, and decided remove this assert and instead just free the newly created node and close the link, since if we cannot get the IP from the link it probably means the connection was closed. ``` === VALKEY BUG REPORT START: Cut & paste starting from here === 17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED === 17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true ------ STACK TRACE ------ 17847 valkey-server * src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634] src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de] /__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e] src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea] src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547] /lib64/libc.so.6(+0x40c8) [0x7f083985a0c8] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b] src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5] ``` But it also introduces another assert. The reason is that this new node is not added to the cluster nodes dict. ``` 17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED === 17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true ------ STACK TRACE ------ 17128 valkey-server * src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4] src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2] src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618] /__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278] src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09] src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5] src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e] ``` This closes valkey-io#1527. Signed-off-by: Binbin <[email protected]>

Update comments and log message in `cluster_legacy.c`. Follow-up from valkey-io#1441. Signed-off-by: Pierre Turin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Binbin <[email protected]>

Update comments and log message in `cluster_legacy.c`. Follow-up from valkey-io#1441. Signed-off-by: Pierre Turin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: Harkrishn Patro <[email protected]>

Update comments and log message in `cluster_legacy.c`. Follow-up from valkey-io#1441. Signed-off-by: Pierre Turin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Binbin <[email protected]>

pieturin force-pushed the cluster-handshake-fix-followup branch 2 times, most recently from 629bfaa to e9b1b3c Compare December 13, 2024 23:53

enjoy-binbin approved these changes Dec 14, 2024

View reviewed changes

enjoy-binbin mentioned this pull request Dec 16, 2024

Drop the MEET packet if the link node is in handshake state #1436

Merged

pieturin added 2 commits December 16, 2024 22:42

Fix code format

855e277

Signed-off-by: Pierre Turin <[email protected]>

hpatro approved these changes Dec 16, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

pieturin force-pushed the cluster-handshake-fix-followup branch from e9b1b3c to 4f82271 Compare December 17, 2024 00:06

PingXie reviewed Dec 17, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/cluster_legacy.c Show resolved Hide resolved

src/cluster_legacy.c Show resolved Hide resolved

pieturin added 3 commits December 17, 2024 03:49

Process gossip section on multiple MEET packets

fe3ac45

Address PR comments. Signed-off-by: Pierre Turin <[email protected]>

Deflake cluster-reliable-meet test

a8d221c

Node 1 might not be known from Node 0 if Node 2 does not gossip about it to Node 1. Signed-off-by: Pierre Turin <[email protected]>

Fix comment formatting

0ccc5cc

Signed-off-by: Pierre Turin <[email protected]>

pieturin added 3 commits December 21, 2024 00:21

Fix crash when receiving MEET from a disconnected link

c367ae8

When we cannot get the IP address from a link, instead of asserting we now free the link and delete the node. Signed-off-by: Pierre Turin <[email protected]>

Fix typo in test

1bc5f29

Signed-off-by: Pierre Turin <[email protected]>

Fix formatting

a542eb5

Signed-off-by: Pierre Turin <[email protected]>

PingXie approved these changes Dec 22, 2024

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

src/cluster_legacy.c Show resolved Hide resolved

src/cluster_legacy.c Show resolved Hide resolved

src/cluster_legacy.c Show resolved Hide resolved

hwware merged commit e4179f1 into valkey-io:unstable Dec 30, 2024
50 checks passed

enjoy-binbin mentioned this pull request Jan 9, 2025

Fix crash when freeing newly created node when nodeIp2String fail #1535

Merged

pieturin deleted the cluster-handshake-fix-followup branch January 14, 2025 22:11

pieturin mentioned this pull request Jan 14, 2025

Update comments and log message in cluster_legacy.c #1561

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only (re-)send MEET packet once every handshake timeout period #1441

Only (re-)send MEET packet once every handshake timeout period #1441

pieturin commented Dec 13, 2024 •

edited

Loading

pieturin commented Dec 14, 2024

enjoy-binbin left a comment

pieturin commented Dec 17, 2024

PingXie left a comment •

edited

Loading

enjoy-binbin commented Dec 17, 2024

pieturin commented Dec 17, 2024

codecov bot commented Dec 17, 2024 •

edited

Loading

madolson commented Dec 17, 2024

pieturin commented Dec 17, 2024

PingXie commented Dec 18, 2024

pieturin commented Dec 19, 2024

pieturin commented Dec 20, 2024

pieturin commented Dec 20, 2024

pieturin commented Dec 21, 2024

PingXie left a comment

PingXie commented Dec 22, 2024 •

edited

Loading

Only (re-)send MEET packet once every handshake timeout period #1441

Only (re-)send MEET packet once every handshake timeout period #1441

Conversation

pieturin commented Dec 13, 2024 • edited Loading

pieturin commented Dec 14, 2024

enjoy-binbin left a comment

Choose a reason for hiding this comment

pieturin commented Dec 17, 2024

PingXie left a comment • edited Loading

Choose a reason for hiding this comment

enjoy-binbin commented Dec 17, 2024

pieturin commented Dec 17, 2024

codecov bot commented Dec 17, 2024 • edited Loading

Codecov Report

madolson commented Dec 17, 2024

pieturin commented Dec 17, 2024

PingXie commented Dec 18, 2024

pieturin commented Dec 19, 2024

pieturin commented Dec 20, 2024

pieturin commented Dec 20, 2024

pieturin commented Dec 21, 2024

PingXie left a comment

Choose a reason for hiding this comment

PingXie commented Dec 22, 2024 • edited Loading

pieturin commented Dec 13, 2024 •

edited

Loading

PingXie left a comment •

edited

Loading

codecov bot commented Dec 17, 2024 •

edited

Loading

PingXie commented Dec 22, 2024 •

edited

Loading