You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This seems to be mostly mitigated in production by setting downAfterMilliseconds and failoverTimeout, but it's still possible.
To Reproduce
Steps to reproduce the behavior:
Use default for downAfterMilliseconds and failoverTimeout
Restart the leader Pod, or trigger a rolling update
Be (un-)lucky
Logs
Nodes after restarting the master:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-redis-node-2 2/2 Running 0 119s 10.42.0.166 k3d-projectsyn-server-0 <none> <none>
test-redis-node-1 2/2 Running 0 69s 10.42.0.167 k3d-projectsyn-server-0 <none> <none>
test-redis-node-0 0/2 CrashLoopBackOff 2 29s 10.42.0.168 k3d-projectsyn-server-0 <none> <none>
Log of former leader test-redis-node-0
12:40:20.33 INFO ==> test-redis-headless.redis-test.svc.cluster.local has my IP: 10.42.0.168 12:40:20.34 INFO ==> Cleaning sentinels in sentinel node: 10.42.0.167Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.1 12:40:25.34 INFO ==> Cleaning sentinels in sentinel node: 10.42.0.166Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.1 12:40:30.35 INFO ==> Sentinels clean up doneWarning: Using a password with '-a' or '-u' option on the command line interface may not be safe.Could not connect to Redis at test-redis.redis-test.svc.cluster.local:26379: Connection refusedWarning: Using a password with '-a' or '-u' option on the command line interface may not be safe.Could not connect to Redis at -p:6379: Name or service not known
Describe the bug
When restarting the leader pod, there is a possibility that the remaining nodes are unable to decide on a new leader.
Additional context
This should be fixed by bitnami/charts#7278 and further improved by bitnami/charts#7333.
This seems to be mostly mitigated in production by setting
downAfterMilliseconds
andfailoverTimeout
, but it's still possible.To Reproduce
Steps to reproduce the behavior:
downAfterMilliseconds
andfailoverTimeout
Logs
Nodes after restarting the master:
Log of former leader test-redis-node-0
Log of test-redis-node-1
The remaining nodes are unable to elect a new leader and try to connect to non existent former leader.
Expected behavior
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: