Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure ensembles reconfigure before nodes exit #578

Merged
merged 5 commits into from
Jun 4, 2014

Conversation

andrewjstone
Copy link
Contributor

See specific commit messages for more details

Prior to this change, the claimant transfered nodes to the invalid state
after they completed handoff and sent them a shutdown message. These
nodes, however,  often shutdown before the root ensemble membership was
changed. This resulted in the root ensemble becoming stuck because it
couldn't talk to the those nodes anymore. This change makes it so that
we wait for the removal of those members in the root ensemble on the
given nodes before putting them in the invalid state in the claimant.

Making the change above, leaves nodes in the exiting state without
shutting them down. In that case the following line returns the
exited nodes as available members in the cluster still and prevents them
from being removed from the root ensemble.

https://github.com/basho/riak_core/blob/develop/src/riak_core_claimant.erl#L766

Changing that line to: `Members = riak_core_ring:ready_members(Ring),`
causes these exited nodes to be seen as removed by the claimant,  and
update the members of the root ensemble so that the claimant can finally
transition to the invalid state and the removed nodes can be shut down.
Ensure all vnode modules that implement the ready_to_exit/0 callback
return true before transitioning vnodes from leaving to exiting state.
@andrewjstone
Copy link
Contributor Author

Part of solution to #572


RootNodes = [Node || {_, Node} <- RootMembers],
RootAdd = Members -- RootNodes,
RootDel = RootNodes -- Members,

Res = [riak_ensemble_manager:remove(node(), N) || N <- RootDel,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dialyzer com pains about this not be an existent function. Looking at the riak_ensemble_manager code, this doesn't seem to exist. Is there something I'm missing, or was this functionality changed at some point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is spread over multiple repos. That function is added in this
commit to riak_ensemble (part of basho/riak_ensemble#19) still waiting for review:
basho/riak_ensemble@fbc02b1

On Sat, May 31, 2014 at 1:26 PM, Reid Draper [email protected]
wrote:

In src/riak_core_claimant.erl:

 RootNodes = [Node || {_, Node} <- RootMembers],
 RootAdd = Members -- RootNodes,
 RootDel = RootNodes -- Members,
  • Res = [riak_ensemble_manager:remove(node(), N) || N <- RootDel,

cc @jtuple https://github.com/jtuple @andrewjstone
https://github.com/andrewjstone


Reply to this email directly or view it on GitHub
https://github.com/basho/riak_core/pull/578/files#r13262254.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the overarching issue: #572

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dialyzer complains about different things with the associated riak_ensemble branch. The new complaints appear to be ignored warnings that haven't had the line number adjusted yet.

@jtuple jtuple assigned jtuple and unassigned lordnull Jun 3, 2014
@jtuple
Copy link
Contributor

jtuple commented Jun 4, 2014

+1

Individual pull-request looks good. This is part of a larger set of pull-requests. Entire set of changes was reviewed/tested as part of basho/riak_test#593

FYI, looks like I forgot to assign myself to this review when I started reviewing these PRs a few weeks ago. Sorry for the duplicate effort @lordnull, but thanks for the extra eyes. Finally got back to this review and completed it, so going to go ahead and let it get merged in.

@andrewjstone
Copy link
Contributor Author

👍 69cb5c4

borshop added a commit that referenced this pull request Jun 4, 2014
Ensure ensembles reconfigure before nodes exit

Reviewed-by: andrewjstone
@andrewjstone
Copy link
Contributor Author

@borshop merge

@borshop borshop merged commit 69cb5c4 into develop Jun 4, 2014
@jtuple jtuple deleted the ajs/ensemble_remove_node branch June 4, 2014 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants