-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ensemble bootstrap logic to enable consensus #571
Milestone
Comments
jtuple
changed the title
Update ensemble bootstrap logic enable consensus
Update ensemble bootstrap logic to enable consensus
Apr 11, 2014
21 tasks
Because there's a workaround ok to defer this until after 2.0 if it's addressed in the docs. |
jtuple
added a commit
that referenced
this issue
Jun 5, 2014
Currently, in addition to enabling consensus in app.config, a user must also manually call 'riak_ensemble_manager:enable()' from one and only one node in a cluster to activate the consensus sub-system. This is necessary to ensure that there is only a single logical root ensemble history -- all other nodes adopt the history from the single enabled node. However, this step is not only annoying but also error-prone. Enabling consensus on multiple nodes can break the consensus system, requiring manual intervention. This commit addresses this problem by making riak_core automatically enable the consensus system in a safe way. This is accomplished by having the claimant node enable the consensus system. To avoid the issue where the claimant in multiple 1-node clusters enables consensus before being joined, this commit requires the cluster to have at least three nodes before the claimant will enable the consensus system. To prevent a race during claimant changes, a claimant must first write a special ring metadata value that prevents future claimants from activating the consensus system. It is not until after the ring has converged cluster wide, and the claimant sees the appropriate metadata value, that the claimant activates the consensus system. Resolves #571
jtuple
added a commit
to basho/riak_test
that referenced
this issue
Jun 5, 2014
Prior to this commit, the various riak_ensemble related tests would manually enable the consensus system on one-and-only-one node in a given cluster in order to work around issue basho/riak_core#571. This commit changes the tests to work properly after the above issue has been fixed. In addition to removing the call to riak_ensemble_manager:enable() that is now handled automatically by Riak, this commit also removes a few wait_until_stable/2 checks against 1-node clusters. These checks no longer apply, since Riak is now designed to only enable the consensus system after the cluster contains at least 3 nodes.
This was referenced Jun 5, 2014
The manual work-around is error prone and is a bit more complicated then as described in this issue given some corner cases. So, decided to just fix it properly. Pull-request #601 is where the action is. |
Fixed in #601 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In basho/riak_ensemble#10, the clustering system for
riak_ensemble
was changed to require a manual step to enable the consensus system on a single node before joining nodes together into a cluster. This change was made to ensure that there is only ever a single logical root ensemble history. Joining nodes adopt the history from the single enabled node.Thus, a user must now run
riak_ensemble_manager:enable().
from the console of Riak claimant to enable strong consistency.We should change the bootstrap logic in
riak_core_claimant
to perform this step automatically. The primary challenge is ensuring only a single node ever performs that step, even in the presence of node failures/partitions.One way to implement this is to have the current claimant wait for "ring ready" and then check for a ring metadata key that notes the consensus singleton node. If the key is missing, the claimant writes itself as the singleton and waits yet again for "ring ready". If the claimant sees itself as the singleton in a ready ring, the claimant then enables the consensus system. Likewise, the bootstrap logic would be updated to join nodes to the known singleton (based on the metadata value) rather than the current claimant.
/cc basho/riak#536
The text was updated successfully, but these errors were encountered: