You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The commitIndex of a node in the "follower" state may decrease, which violates the Raft paper's description that the commitIndex should only increase monotonically.
#54
Open
liang636600 opened this issue
Jan 12, 2025
· 0 comments
Hi, @wenweihu86@loveheaven@guohao@wangwg1 , I have observed that in certain cases, the commitIndex of a node in the "follower" state may decrease, which violates the Raft paper's description that the commitIndex should only increase monotonically. This issue could potentially lead to inconsistencies in the system state. I will now explain my findings in detail.
How to trigger this bug
To clearly illustrate the issue in this example, we have made some simplifications. For instance, the leader election process is not shown in Figure 1. After the leader election, n1 becomes the leader, and both n1 and n2 are in term 1. It is a three-node system, but node 3 is not shown in Figure 1.
Here is a brief description of the process: First, n1 receives a write request from a client and writes the value 5 into its log. Then, it sends log synchronization requests (AEReq) to n2 via actions 2 and 4 (the content of the AEReq messages is shown in the table at Action 2, 4). Note that the AEReq message in action 4 is delayed. n2 writes the log entry with value 5 to its own log and then processes the AEReq from action 2 through action 3, setting its commitIndex = 0. Next, n1 processes n2’s response and sets its own commitIndex = 1 through action 5. After that, n1 receives a new write request from the client, writes the value 6 to its log, and sends log synchronization requests (AEReq) to n2 via actions 7(the content of the AEReq messages is shown in the table at Action 7). n2 writes the log entry with value 6 to its log and sets its commitIndex = 1 via action 8. Finally, n2 receives the delayed AEReq message sent by n1 in action 4, processes it via action 9, and sets its commitIndex = 0.
Through this entire process, we can observe that the follower node n2 sets its commitIndex to 1 after action 8. However, after action 9, n2 sets its commitIndex to 0, which results in a decrease in commitIndex. This violates the Raft paper's description that commitIndex should only increase monotonically (as shown in Figure 2).
Figure 1. An Example Triggering the Decrease of commitIndex on a Follower Node.
Figure 2. The description of commitIndex in the Raft paper.
The root cause
The root cause of this issue is that the RaftJava code does not strictly conform the description in the Raft paper when processing AppendEntriesRequest (AEReq) messages. As shown in Figure 3, the Raft paper specifies the following rule for handling AEReq: "If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of the last new entry)." However, in this RaftJava code, as shown in the diagram below, the commitIndex is updated directly without the check leaderCommit > commitIndex, which ultimately leads to the issue of commitIndex decreasing.
Figure 3. The description in the Raft paper regarding how to handle AEReq.
Suggested fix
Once the root cause of the issue is identified, fixing it is straightforward. It is only necessary to update commitIndex while processing AEReq messages according to the instructions in the Raft paper. Specifically, modify the advanceCommitIndex method in com.github.wenweihu86.raft.service.impl.RaftConsensusServiceImpl within the RaftJava project. The commitIndex should only be updated when the condition leaderCommit > commitIndex is satisfied. The specific modification is as follows:
Hi, @wenweihu86 @loveheaven @guohao @wangwg1 , I have observed that in certain cases, the
commitIndex
of a node in the "follower" state may decrease, which violates the Raft paper's description that thecommitIndex
should only increase monotonically. This issue could potentially lead to inconsistencies in the system state. I will now explain my findings in detail.How to trigger this bug
To clearly illustrate the issue in this example, we have made some simplifications. For instance, the leader election process is not shown in Figure 1. After the leader election, n1 becomes the leader, and both n1 and n2 are in term 1. It is a three-node system, but node 3 is not shown in Figure 1.
Here is a brief description of the process: First, n1 receives a write request from a client and writes the value 5 into its log. Then, it sends log synchronization requests (AEReq) to n2 via actions 2 and 4 (the content of the AEReq messages is shown in the table at Action 2, 4). Note that the AEReq message in action 4 is delayed. n2 writes the log entry with value 5 to its own log and then processes the AEReq from action 2 through action 3, setting its
commitIndex = 0
. Next, n1 processes n2’s response and sets its owncommitIndex = 1
through action 5. After that, n1 receives a new write request from the client, writes the value 6 to its log, and sends log synchronization requests (AEReq) to n2 via actions 7(the content of the AEReq messages is shown in the table at Action 7). n2 writes the log entry with value 6 to its log and sets itscommitIndex = 1
via action 8. Finally, n2 receives the delayed AEReq message sent by n1 in action 4, processes it via action 9, and sets itscommitIndex = 0
.Through this entire process, we can observe that the follower node n2 sets its
commitIndex
to 1 after action 8. However, after action 9, n2 sets itscommitIndex
to 0, which results in a decrease incommitIndex
. This violates the Raft paper's description thatcommitIndex
should only increase monotonically (as shown in Figure 2).Figure 1. An Example Triggering the Decrease of
commitIndex
on a Follower Node.Figure 2. The description of commitIndex in the Raft paper.
The root cause
The root cause of this issue is that the RaftJava code does not strictly conform the description in the Raft paper when processing AppendEntriesRequest (AEReq) messages. As shown in Figure 3, the Raft paper specifies the following rule for handling AEReq: "If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of the last new entry)." However, in this RaftJava code, as shown in the diagram below, the
commitIndex
is updated directly without the checkleaderCommit > commitIndex
, which ultimately leads to the issue ofcommitIndex
decreasing.raft-java/raft-java-core/src/main/java/com/github/wenweihu86/raft/service/impl/RaftConsensusServiceImpl.java
Lines 312 to 315 in 50761c6
Figure 3. The description in the Raft paper regarding how to handle AEReq.
Suggested fix
Once the root cause of the issue is identified, fixing it is straightforward. It is only necessary to update
commitIndex
while processing AEReq messages according to the instructions in the Raft paper. Specifically, modify theadvanceCommitIndex
method incom.github.wenweihu86.raft.service.impl.RaftConsensusServiceImpl
within the RaftJava project. ThecommitIndex
should only be updated when the conditionleaderCommit > commitIndex
is satisfied. The specific modification is as follows:Thank you for taking the time to read this. I'm looking forward to your confirmation, and would be happy to help fix the issue if needed.
The text was updated successfully, but these errors were encountered: