-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully handle mixed consensus configuration #559
Labels
Milestone
Comments
21 tasks
engelsanchez
added a commit
to basho/riak_ensemble
that referenced
this issue
Jun 25, 2014
Convert the exit exception from a gen_server call to a remote manager to an error code. This makes it more natural to handle the situation. This fix is related to issue basho/riak#559, to prevent crashing when operating in mixed clusters where ensembles are not enabled on all nodes.
Is merging #609 sufficient to close this? |
Yes. We tested mixed configurations and the crash on manager down was the one issue found. Let @jtuple or @andrewjstone re-open if any undocumented issued was found. |
Yep, that's all I know of. Valid close :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, when not all nodes in a cluster have consensus (
strong_consistency = on
) configured, nodes with consensus enabled (particularly the claimant as well as the root leader) have various processes crash trying to send messages (eg.gen_server:call
) to processes that aren't running on the remote nodes (eg.riak_ensemble_manager
).Test this scenario in more detail and make things not crash.
In the future, we should consider adding a proper consensus capability, but I'm not sure that's necessary nor something we should add this late in the 2.0 game. But, feel free to argue that point if you disagree.
It's entirely fine for 2.0 to ship with the condition that users must configure all nodes the same as far as consensus goes for things to work properly (eg. reads/writes to consistent operations, ensembles coming up, etc). However, for a user to activate consensus cluster wide, they'll need to do a rolling restart to change the configuration and we should gracefully handle the mixed case during this window (eg. operations may fail, but we shouldn't be spamming the log with errors or crashing processes for no good reason).
/cc #536
The text was updated successfully, but these errors were encountered: