Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Aggregator Pipeline Blocks Akka Cluster Formation #89

Open
vjkoskela opened this issue Jul 20, 2018 · 0 comments
Open

Cluster Aggregator Pipeline Blocks Akka Cluster Formation #89

vjkoskela opened this issue Jul 20, 2018 · 0 comments

Comments

@vjkoskela
Copy link
Member

We had a scenario where a bad (invalid JSON) cluster pipeline configuration was deployed. When the cluster restarted it seemed that the Akka cluster was unhealthy.

{"time":"2018-07-20T02:58:34.790Z","name":"log","level":"warn","data":{"message":"Cluster Node [akka.tcp://Metrics@iad4f-re22-2a:2551] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://Metrics@iad4d-rd41-38a:2551, status = Up)]. Node roles [dc-default]"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-28","logger":"a.c.ClusterCoreDaemon"},"id":"ad8b61f7-4a18-479f-bf98-8ea3289b1f52","version":"0"}

Eventually each node would output:

{"time":"2018-07-20T02:58:20.809Z","name":"log","level":"warn","data":{"message":"Association with remote system [akka.tcp://Metrics@iad4c-rf14-36a:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated] "},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-33","logger":"a.r.ReliableDeliverySupervisor"},"id":"7b7baeff-df15-4846-90ba-891a162b3e51","version":"0"}

You would also see some of these:

{"time":"2018-07-20T02:58:27.961Z","name":"log","level":"warn","data":{"message":"heartbeat interval is growing too large: 2001 millis"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-67","logger":"a.r.PhiAccrualFailureDetector"},"id":"8c7597b8-3dda-432d-a26c-6eed5a26df1b","version":"0"

And then there was the scary:

{"time":"2018-07-20T02:58:58.790Z","name":"log","level":"info","data":{"message":"Cluster Node [akka.tcp://Metrics@iad4f-re22-2a:2551] - Leader can currently not perform its duties, reachability status: [akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Reachable [Unreachable] (26), akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4d-rd41-38a:2551: Unreachable [Unreachable] (12), akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4f-re22-2a:2551: Unreachable [Unreachable] (19), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Reachable [Unreachable] (32), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4c-rf14-36a:2551: Unreachable [Unreachable] (31), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4f-re22-2a:2551: Unreachable [Unreachable] (30), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Unreachable [Unreachable] (27), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4c-rf14-36a:2551: Unreachable [Unreachable] (26), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4d-rd41-38a:2551: Unreachable [Unreachable] (25)], member status: [akka.tcp://Metrics@iad4a-rl2-17c:2551 Up seen=false, akka.tcp://Metrics@iad4c-rf14-36a:2551 Up seen=false, akka.tcp://Metrics@iad4d-rd41-38a:2551 Up seen=false, akka.tcp://Metrics@iad4f-re22-2a:2551 Up seen=true]"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-48","logger":"a.c.Cluster(akka://Metrics)"},"id":"220dc047-2a81-4507-802d-203cd7902b27","version":"0"}

So it seems that Akka cluster formation is dependent on a successful loading of the cluster pipeline. However, intuitively it feels like this should not be the case; or at the very least if this dependency exists and must exist then the cluster formation should not even be attempted if the cluster pipeline configuration cannot be loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant