Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle mixed consensus configuration #559

Closed
jtuple opened this issue Jun 16, 2014 · 3 comments
Closed

Gracefully handle mixed consensus configuration #559

jtuple opened this issue Jun 16, 2014 · 3 comments
Milestone

Comments

@jtuple
Copy link
Contributor

jtuple commented Jun 16, 2014

Currently, when not all nodes in a cluster have consensus (strong_consistency = on) configured, nodes with consensus enabled (particularly the claimant as well as the root leader) have various processes crash trying to send messages (eg. gen_server:call) to processes that aren't running on the remote nodes (eg. riak_ensemble_manager).

Test this scenario in more detail and make things not crash.

In the future, we should consider adding a proper consensus capability, but I'm not sure that's necessary nor something we should add this late in the 2.0 game. But, feel free to argue that point if you disagree.

It's entirely fine for 2.0 to ship with the condition that users must configure all nodes the same as far as consensus goes for things to work properly (eg. reads/writes to consistent operations, ensembles coming up, etc). However, for a user to activate consensus cluster wide, they'll need to do a rolling restart to change the configuration and we should gracefully handle the mixed case during this window (eg. operations may fail, but we shouldn't be spamming the log with errors or crashing processes for no good reason).

/cc #536

@jtuple jtuple added this to the 2.0-RC milestone Jun 16, 2014
@andrewjstone andrewjstone self-assigned this Jun 24, 2014
engelsanchez added a commit to basho/riak_ensemble that referenced this issue Jun 25, 2014
Convert the exit exception from a gen_server call to a remote manager to
an error code. This makes it more natural to handle the situation.
This fix is related to issue basho/riak#559, to prevent crashing when
operating in mixed clusters where ensembles are not enabled on all
nodes.
@jonmeredith
Copy link
Contributor

Is merging #609 sufficient to close this?

@engelsanchez
Copy link
Contributor

Yes. We tested mixed configurations and the crash on manager down was the one issue found. Let @jtuple or @andrewjstone re-open if any undocumented issued was found.

@andrewjstone
Copy link
Contributor

Yep, that's all I know of. Valid close :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants