-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cluster meet reliable under link failures #461
Conversation
Posting this for initial comments. I can migrate the test based on the new framework once #442 is merged. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #461 +/- ##
============================================
- Coverage 70.22% 70.20% -0.02%
============================================
Files 109 109
Lines 59956 59967 +11
============================================
- Hits 42104 42102 -2
- Misses 17852 17865 +13
|
I think it's worth investing on this redis/redis#11095 to avoid this issue altogether. |
Thanks, I wasn't aware of this linked issue. IMO these two issues can be solved independently. The linked issue tries to make the admin experience better for MEET command where as this PR tries to address a specific gap in MEET implementation.
The problem addressed in this PR (asymmetric cluster membership) can happen with SYNC MEET as well due to link failures. So, it is worth solving it. The handshake nodes will still be removed after the handshake timeout (same as node_timeout of 15s). Wdyt? |
Yeah, I still believe this a problem even with the #11095. |
Awesome material for our next release which will be full of cluster improvements. Is it worth mentioning in release notes? Btw @srgsanky you need to commit with -s. See the instructions on the DCO CI job's details page. |
I would also be inclined to backport it. |
When I tried to merge the new changes into my fork, I ended up with a merge commit
I want to signoff just 49a884c, but the rebase is adding a signoff to all commits 315b757..d52c8f3 which are not made by me. Do you have any recommendation to fix this? As an alternate option, I can start fresh and add a new commit from the tip of unstable. I am not sure if I will be able to reuse this PR. |
d8aa71c
to
2ff9879
Compare
I believe it's possible to undo a merge by If nothing works, then it's always possible to start from scratch with a new branch and cherry-pick all your commits into it. Then you can rename the branches and force-push to this PR's branch. |
@srgsanky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor nitpicks around the tests, it overall LGTM.
When there is a link failure while an ongoing MEET request is sent the sending node stops sending anymore MEET and starts sending PINGs. Since every node responds to PINGs from unknown nodes with a PONG, the receiving node never adds the sending node. But the sending node adds the receiving node when it sees a PONG. This can lead to asymmetry in cluster membership. This changes makes the sender keep sending MEET until it sees a PONG, avoiding the asymmetry. Signed-off-by: Sankar <[email protected]>
Signed-off-by: Sankar <[email protected]>
Signed-off-by: Sankar <[email protected]>
Correct. It starts processing when we drop the filter - which can be 4th or later. |
This worked. Thanks!
I tried this and all the commits in the other branch of the merge was also annotated with my signoff. So, I decided to ask you folks for the best approach. btw is there any reasoning behind the requirement for the signoff? |
Signed-off-by: Sankar <[email protected]>
Technically we adopted it because it's an LF requirement, but it's also a good practice to force a trail of who committed what. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change LGTM overall.
1. Reworked code comment 1. Added serverLogs 1. Renamed debug variable 1. Made close link filter to be directly coupled with drop filter Signed-off-by: Sankar <[email protected]>
Multiple MEETs will be handled like a normal PING message. Signed-off-by: Sankar <[email protected]>
Signed-off-by: Sankar <[email protected]>
Signed-off-by: Sankar <[email protected]>
Signed-off-by: Sankar <[email protected]>
The clang-format checker is currently failing due to changes introduced by another PR. Mentioned this in #118 (comment) |
sorry. maybe i missed some. fixed #570 |
ref: - #118 (my pervious change) - #461 (issuing that clang format checker fails due to my change) There was an issue that clang-format cheker failed. I don't know why I missed it and why it didn't catch. just running `clang-format -i bitops.c` was all. Signed-off-by: LiiNen <[email protected]>
Signed-off-by: Sankar <[email protected]>
Signed-off-by: Sankar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
No worries. These can all be addressed incrementally. |
When there is a link failure while an ongoing MEET request is sent the sending node stops sending anymore MEET and starts sending PINGs. Since every node responds to PINGs from unknown nodes with a PONG, the receiving node never adds the sending node. But the sending node adds the receiving node when it sees a PONG. This can lead to asymmetry in cluster membership. This changes makes the sender keep sending MEET until it sees a PONG, avoiding the asymmetry. --------- Signed-off-by: Sankar <[email protected]>
When there is a link failure while an ongoing MEET request is sent the sending node stops sending anymore MEET and starts sending PINGs. Since every node responds to PINGs from unknown nodes with a PONG, the receiving node never adds the sending node. But the sending node adds the receiving node when it sees a PONG. This can lead to asymmetry in cluster membership. This changes makes the sender keep sending MEET until it sees a PONG, avoiding the asymmetry. --------- Signed-off-by: Sankar <[email protected]>
When there is a link failure while an ongoing MEET request is sent the sending node stops sending anymore MEET and starts sending PINGs. Since every node responds to PINGs from unknown nodes with a PONG, the receiving node never adds the sending node. But the sending node adds the receiving node when it sees a PONG. This can lead to asymmetry in cluster membership. This changes makes the sender keep sending MEET until it sees a PONG, avoiding the asymmetry. --------- Signed-off-by: Sankar <[email protected]> Signed-off-by: Ping Xie <[email protected]>
When there is a link failure while an ongoing MEET request is sent the sending node stops sending anymore MEET and starts sending PINGs. Since every node responds to PINGs from unknown nodes with a PONG, the receiving node never adds the sending node. But the sending node adds the receiving node when it sees a PONG. This can lead to asymmetry in cluster membership. This changes makes the sender keep sending MEET until it sees a PONG, avoiding the asymmetry. --------- Signed-off-by: Sankar <[email protected]> Signed-off-by: Ping Xie <[email protected]>
When there is a link failure while an ongoing MEET request is sent the sending node stops sending anymore MEET and starts sending PINGs. Since every node responds to PINGs from unknown nodes with a PONG, the receiving node never adds the sending node. But the sending node adds the receiving node when it sees a PONG. This can lead to asymmetry in cluster membership. This changes makes the sender keep sending MEET until it sees a PONG, avoiding the asymmetry.