-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure ensembles reconfigure before nodes exit #578
Conversation
Prior to this change, the claimant transfered nodes to the invalid state after they completed handoff and sent them a shutdown message. These nodes, however, often shutdown before the root ensemble membership was changed. This resulted in the root ensemble becoming stuck because it couldn't talk to the those nodes anymore. This change makes it so that we wait for the removal of those members in the root ensemble on the given nodes before putting them in the invalid state in the claimant. Making the change above, leaves nodes in the exiting state without shutting them down. In that case the following line returns the exited nodes as available members in the cluster still and prevents them from being removed from the root ensemble. https://github.com/basho/riak_core/blob/develop/src/riak_core_claimant.erl#L766 Changing that line to: `Members = riak_core_ring:ready_members(Ring),` causes these exited nodes to be seen as removed by the claimant, and update the members of the root ensemble so that the claimant can finally transition to the invalid state and the removed nodes can be shut down.
Ensure all vnode modules that implement the ready_to_exit/0 callback return true before transitioning vnodes from leaving to exiting state.
Part of solution to #572 |
|
||
RootNodes = [Node || {_, Node} <- RootMembers], | ||
RootAdd = Members -- RootNodes, | ||
RootDel = RootNodes -- Members, | ||
|
||
Res = [riak_ensemble_manager:remove(node(), N) || N <- RootDel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dialyzer com pains about this not be an existent function. Looking at the riak_ensemble_manager code, this doesn't seem to exist. Is there something I'm missing, or was this functionality changed at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix is spread over multiple repos. That function is added in this
commit to riak_ensemble (part of basho/riak_ensemble#19) still waiting for review:
basho/riak_ensemble@fbc02b1
On Sat, May 31, 2014 at 1:26 PM, Reid Draper [email protected]
wrote:
In src/riak_core_claimant.erl:
RootNodes = [Node || {_, Node} <- RootMembers], RootAdd = Members -- RootNodes, RootDel = RootNodes -- Members,
- Res = [riak_ensemble_manager:remove(node(), N) || N <- RootDel,
cc @jtuple https://github.com/jtuple @andrewjstone
https://github.com/andrewjstone—
Reply to this email directly or view it on GitHub
https://github.com/basho/riak_core/pull/578/files#r13262254.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @lordnull @reiddraper ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the overarching issue: #572
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dialyzer complains about different things with the associated riak_ensemble branch. The new complaints appear to be ignored warnings that haven't had the line number adjusted yet.
+1 Individual pull-request looks good. This is part of a larger set of pull-requests. Entire set of changes was reviewed/tested as part of basho/riak_test#593 FYI, looks like I forgot to assign myself to this review when I started reviewing these PRs a few weeks ago. Sorry for the duplicate effort @lordnull, but thanks for the extra eyes. Finally got back to this review and completed it, so going to go ahead and let it get merged in. |
👍 69cb5c4 |
Ensure ensembles reconfigure before nodes exit Reviewed-by: andrewjstone
@borshop merge |
See specific commit messages for more details