-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make node removal work with riak_ensemble #572
Milestone
Comments
21 tasks
This was referenced Apr 23, 2014
jtuple
added a commit
to basho/riak_test
that referenced
this issue
Jun 3, 2014
ensemble_remove_node2 uses an intercept to prevent a riak_ensemble related transition that is necessary for nodes to completely exit and shutdown after removal. In fact, testing for this scenario is the entire point of this test, since it is testing logic that was added to solve basho/riak_core#572 and that logic prevents nodes from exiting until that transition occurs. However, even without this new logic, there is an unrelated riak_ensemble related bug that can trigger a race condition that also prevents nodes from shutting down. The good news is that other changes made as part of the solution to solve basho/riak_core#572 also fix this unrelated bug. Therefore this commit extends ensemble_remove_node2 to remove the intercept at the end of the test and verify that the removed nodes do actually end up exiting as expected. Thus, the test now tests for both the negative and positive scenarios and serves as a test against future regressions that stall node removal/shutdown.
Fixed via basho/riak_ensemble#19, #578, basho/riak_kv#926, and basho/riak_test#593. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, removing/leaving a node from a Riak cluster does not remove the node from the root ensemble. Thus, removed nodes look like offline/partitioned nodes and affect the ability of the root ensemble to meet quorum.
We need to fix this before shipping 2.0.
A related issue is that removing and rejoining a node of the same name is not safe -- even for non-root ensembles. This is because nothing is done to ensure old peer data is removed when leaving. Regardless of node name, the two instances of the same node should be considered logically disjoint peers with no shared history.
/cc basho/riak#536
The text was updated successfully, but these errors were encountered: