Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make node removal work with riak_ensemble #572

Closed
jtuple opened this issue Apr 11, 2014 · 1 comment
Closed

Make node removal work with riak_ensemble #572

jtuple opened this issue Apr 11, 2014 · 1 comment
Milestone

Comments

@jtuple
Copy link
Contributor

jtuple commented Apr 11, 2014

Currently, removing/leaving a node from a Riak cluster does not remove the node from the root ensemble. Thus, removed nodes look like offline/partitioned nodes and affect the ability of the root ensemble to meet quorum.

We need to fix this before shipping 2.0.

A related issue is that removing and rejoining a node of the same name is not safe -- even for non-root ensembles. This is because nothing is done to ensure old peer data is removed when leaving. Regardless of node name, the two instances of the same node should be considered logically disjoint peers with no shared history.

/cc basho/riak#536

@jtuple jtuple added this to the 2.0-RC milestone Apr 11, 2014
@jtuple jtuple added the Bug label Apr 11, 2014
jtuple added a commit to basho/riak_test that referenced this issue Jun 3, 2014
ensemble_remove_node2 uses an intercept to prevent a riak_ensemble
related transition that is necessary for nodes to completely exit and
shutdown after removal. In fact, testing for this scenario is the
entire point of this test, since it is testing logic that was added to
solve basho/riak_core#572 and that logic prevents nodes from exiting
until that transition occurs.

However, even without this new logic, there is an unrelated
riak_ensemble related bug that can trigger a race condition that also
prevents nodes from shutting down.

The good news is that other changes made as part of the solution to
solve basho/riak_core#572 also fix this unrelated bug. Therefore this
commit extends ensemble_remove_node2 to remove the intercept at the
end of the test and verify that the removed nodes do actually end up
exiting as expected. Thus, the test now tests for both the negative
and positive scenarios and serves as a test against future regressions
that stall node removal/shutdown.
@jtuple
Copy link
Contributor Author

jtuple commented Jun 4, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants