-
Edit: I think I might have found at least part of the cause of this... is passed into the leader state in the following line: I have tested this idea, but it did not fully fix the issue. So there might be more problems as described below. There must be some task overhead or the wrong dispatch loop that waits for completions ... I am not sure. See the end of the issue for more ... Original question: during development I often observe my cluster going into a leader switch (ping pong) when I go from a 3 member cluster and let the leader die. The two followers select one of them as leader. The follower appears to think that the recently elected leader is dead (no heartbeat) and decides that he wants another election. The requests a pre-vote from the leader, he will answer with yes because the other member has a higher term. Then the other member is all set to call an actual election, votes for himself and gets the vote from the former leader. Now they have switched. Unfortunately the now follower repeats the process and seems to think the just newly elected leader is dead again ... and this keeps going on for ever. When using a very very frequent heart beat (0.1) it minimizes the problem, but it feels like that should not be how to fix it. I have tried the default settings for everything and with an It also appears that this is far more likely to happen when you go from a cluster with 3 to 2. If a cluster of 4 is reduced to 3 it appears to be stable with the same settings. Right now I am using these memberConfiguration.HeartbeatThreshold = 0.5;
memberConfiguration.LowerElectionTimeout = 2500;
memberConfiguration.UpperElectionTimeout = 5000;
memberConfiguration.RequestTimeout = TimeSpan.FromMinutes(1);
memberConfiguration.Partitioning = false; In general I would expect this to stop after a few iterations because eventually the random timeout chosen should make one of them win and keep the leader state. The heartbeats should be frequent enough to always arrive in time ... thought it appears they don't do that? Steps to reproduce:
My debugging attempts: If someone is elected leader he must always sent a heartbeat to all other members as the first thing he does - this is described in the raft specification. I suspect the current implementation does not do this and instead waits one heartbeat interval before sending them. This would mean that the other cluster member experiences a race between him timing out and getting the heartbeat from the leader that was just elected - making it very likely to consider the newly elected member "dead on arrival". Or heartbeats are not sent often enough. Here are some of the adjustments I did to debug and the outputs. Given the configured private async Task DoHeartbeats(TimeSpan period, IAuditTrail<IRaftLogEntry> auditTrail, IClusterConfigurationStorage configurationStorage, CancellationToken token)
{
using var cancellationSource = token.LinkTo(LeadershipToken);
// reuse this buffer to place responses from other nodes
using var taskBuffer = new AsyncResultSet(Members.Count);
Console.WriteLine($"{DateTime.Now}: Starting sending heartbeats with interval of {period}");
for (var forced = false; await DoHeartbeats(taskBuffer, auditTrail, configurationStorage, token).ConfigureAwait(false); forced = await WaitForReplicationAsync(period, token).ConfigureAwait(false))
{
Console.WriteLine($"{DateTime.Now}: WaitForReplicationAsync completed, forced={forced}");
if (forced)
DrainReplicationQueue();
taskBuffer.Clear(true);
Console.WriteLine($"{DateTime.Now}: Beginning DoHeatbeats loop again");
}
} And another console log in the overload: private async Task<bool> DoHeartbeats(AsyncResultSet taskBuffer, IAuditTrail<IRaftLogEntry> auditTrail, IClusterConfigurationStorage configurationStorage, CancellationToken token)
{
Console.WriteLine($"{DateTime.Now}: Added task to send heartbeat to a member");
...
} This resulted in the following log on the leader:
What ever causes it, the guarantee must be that a) the heart beats are configured to be sent faster than the lower election timeout b) that what ever processing time might be attached to them does not stop them from being "queued" to be send at the fixed rate. Sorry for the wall of text. Here is a cookie 🍪 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
@Arkensor , request timeout is too high in your setup. On Windows, in case of localhost communication, the leader should wait for this timeout to detect unavailable member. This is the expected behavior of tcp socket on Windows. On Linux, it behaves differently. Anyway, you should set request timeout lower than |
Beta Was this translation helpful? Give feedback.
-
Probably, that maybe an issue. I can fix it quickly. |
Beta Was this translation helpful? Give feedback.
@Arkensor , request timeout is too high in your setup. On Windows, in case of localhost communication, the leader should wait for this timeout to detect unavailable member. This is the expected behavior of tcp socket on Windows. On Linux, it behaves differently. Anyway, you should set request timeout lower than
lowerElectionTimeout
.