Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session disconnect doesn't shutdown work #16

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jdanbrown
Copy link
Contributor

In Cluster.connectionWatcher a KeeperState.Disconnected doesn't trigger forceShutdown, which is responsible for calling listener.shutdownWork and listener.onLeave. My guess as to why this is is that it's waiting for a KeeperState.Expired, which does trigger forceShutdown and is otherwise identical (modulo logging).

Based on my recent experience and the zk session state-transition docs, this allows the following behavior:

  1. Client partitions from zk cluster
  2. Client-side zk session timeout triggers, client receives KeeperState.Disconnected
  3. Clients stays partitioned from network for arbitrarily long, but ordasity continues running its previously-claimed work
  4. Client eventually rejoins network and regains route to zk cluster
  5. Client attempts to reconnect to the zk cluster, cluster says no, client receives KeeperState.Expired, ordasity finally shuts down work
  6. Client establishes new session, ordasity claims new work

This is harmful in my application since I want work ownership to be best-effort exclusive, i.e. nodes should minimize their overlap in work.

Is this a bug, or was ordasity intentionally designed to maximize this kind of overlap when nodes partition? I can imagine that being a useful—or at least not harmful—behavior in some settings. If so, maybe I can elaborate this proposed change to include a config option to maximize vs. minimize oblivious work overlap?

@jdanbrown
Copy link
Contributor Author

Ok, this change appears to violate some important assumptions—see the two FIXME commits above.

I'd be interested to hear your thoughts on how this could be made to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant