Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for removing nodes without breaking riak_ensemble #593

Merged
merged 8 commits into from
Jun 4, 2014

Conversation

andrewjstone
Copy link
Contributor

@andrewjstone
Copy link
Contributor Author

@lordnull
Copy link
Contributor

lordnull commented Jun 2, 2014

This needs a rebase; one of the tests references rt_intercept_pt which exists on master but not on this branch.

Running a rebased version causes a failure still, though:

11:26:08.225 [warning] ensemble_remove_node failed: {{assertEqual_failed,[{module,ensemble_remove_node},{line,70},{expression,"length ( Cluster )"},{expected,1},{value,0}]},[{ensemble_remove_node,'-confirm/0-fun-7-',2,[{file,"tests/ensemble_remove_node.erl"},{line,70}]},{ensemble_remove_node,confirm,0,[{file,"tests/ensemble_remove_node.erl"},{line,70}]},{riak_test_runner,return_to_exit,3,[{file,"src/riak_test_runner.erl"},{line,159}]}]}
11:26:08.225 [error] Error in process <0.190.0> on node '[email protected]' with exit value: {{assertEqual_failed,[{module,ensemble_remove_node},{line,70},{expression,"length ( Cluster )"},{expected,1},{value,0}]},[{ensemble_remove_node,'-confirm/0-fun-7-',2,[{file,"tests/ensemble_remove_node.erl"},{line,70}]},{ensemble_remove_node... 


11:26:08.225 [error] 
================ ensemble_remove_node failure stack trace =====================
{{assertEqual_failed,[{module,ensemble_remove_node},
                      {line,70},
                      {expression,"length ( Cluster )"},
                      {expected,1},
                      {value,0}]},
 [{ensemble_remove_node,'-confirm/0-fun-7-',2,
                        [{file,"tests/ensemble_remove_node.erl"},{line,70}]},
  {ensemble_remove_node,confirm,0,
                        [{file,"tests/ensemble_remove_node.erl"},{line,70}]},
  {riak_test_runner,return_to_exit,3,
                    [{file,"src/riak_test_runner.erl"},{line,159}]}]}
===============================================================================

ensemble_remove_node2 uses an intercept to prevent a riak_ensemble
related transition that is necessary for nodes to completely exit and
shutdown after removal. In fact, testing for this scenario is the
entire point of this test, since it is testing logic that was added to
solve basho/riak_core#572 and that logic prevents nodes from exiting
until that transition occurs.

However, even without this new logic, there is an unrelated
riak_ensemble related bug that can trigger a race condition that also
prevents nodes from shutting down.

The good news is that other changes made as part of the solution to
solve basho/riak_core#572 also fix this unrelated bug. Therefore this
commit extends ensemble_remove_node2 to remove the intercept at the
end of the test and verify that the removed nodes do actually end up
exiting as expected. Thus, the test now tests for both the negative
and positive scenarios and serves as a test against future regressions
that stall node removal/shutdown.
@jtuple
Copy link
Contributor

jtuple commented Jun 3, 2014

This pull-request adds two new tests: ensemble_remove_node and ensemble_remove_node2.

As desired, ensemble_remove_node fails against develop without the related pull-requests applied; and passes with the pull-requests applied.

ensemble_remove_node2 is a bit tricker. It tests a condition that should only be true after merging the necessary pull-requests. Yet, it still passes against develop. Why? Well, it turns out there is an unrelated bug that can hit a race condition which mirrors the condition the test is checking. As luck would have it, this bug is indirectly fixed by the changes in the associated pull-requests as well.

So, I've extended the test slightly in f822e52 to explicitly verify that the intercept used in the test is the reason why nodes are not exiting, rather than from the unrelated bug that previously existed on master. As extended, the test now fails against develop, but passes after having the necessary pull-requests applied.

So, +1 on these tests. Although, please double check my changes to the test.

Once the related pull-requests are reviewed/merged, these tests can be merged in. No need to rebase as @lordnull suggested as this pull-request merges cleanly and work just fine after merging.

@jtuple jtuple assigned jtuple and unassigned lordnull Jun 3, 2014
@jtuple
Copy link
Contributor

jtuple commented Jun 4, 2014

Finished reviewing complete feature change across all pull-requests. +1 to merge basho/riak_ensemble#19, basho/riak_core#578, and basho/riak_kv#926

FYI, looks like I forgot to assign myself to this issue when I first started this review ~2 weeks ago. Sorry for the duplicate effort @lordnull, but thanks for the extra pair of eyes.

@andrewjstone
Copy link
Contributor Author

Test extension looks good @jtuple
👍

@jtuple jtuple merged commit 100180e into master Jun 4, 2014
@jtuple jtuple deleted the ajs/ensemble_remove_node branch June 4, 2014 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants