Document checkpointing

rabbitmq · Jan 26, 2024 · f518d16 · f518d16
1 parent 07c9b61
commit f518d16
Show file tree

Hide file tree

Showing 3 changed files with 36 additions and 2 deletions.
diff --git a/docs/internals/INTERNALS.md b/docs/internals/INTERNALS.md
@@ -188,6 +188,27 @@ It is not guaranteed that a snapshot will be taken. A decision to take
 a snapshot or to delay it is taken using a number of internal Ra state factors.
 The goal is to minimise disk I/O activity when possible.
 
+### Checkpointing
+
+Checkpoints are nearly the same concept as snapshots. Snapshotting truncates
+the log up to the snapshot's index, which might be undesirable for machines
+which read from the log with the `{log, Indexes, Fun}` effect mentioned above.
+
+The `{checkpoint, RaftIndex, MachineState}` effect can be used as a hint to
+trigger a checkpoint. Like snapshotting, this effect is evaluated on all nodes
+and when a checkpoint is taken, the machine state is saved to disk and can be
+used for recovery when the machine restarts. A checkpoint being written does
+not trigger any log truncation though.
+
+The `{release_cursor, RaftIndex}` effect can then be used to promote any
+existing checkpoint older than or equal to `RaftIndex` into a proper snapshot,
+and any log entries older than the checkpoint's index are then truncated.
+
+These two effects are intended for machines that use the `{log, Indexes, Fun}`
+effect and can substantially improve machine recovery time compared to
+snapshotting alone, especially when the machine needs to keep old log entries
+around for a long time.
+
 ## State Machine Versioning
 
 It is eventually necessary to make changes to the state machine

diff --git a/docs/internals/STATE_MACHINE_TUTORIAL.md b/docs/internals/STATE_MACHINE_TUTORIAL.md
@@ -218,3 +218,11 @@ or similar.
 To (potentially) trigger a snapshot return the `{release_cursor, RaftIndex, MachineState}`
 effect. This is why the raft index is included in the `apply/3` function. Ra will
 only create a snapshot if doing so will result in log segments being deleted.
+
+For machines that must keep log segments on disk for some time, the
+`{checkpoint, RaftIndex, MachineState}` effect can be used. This creates a
+snapshot-like view of the machine state on disk but doesn't trigger log
+truncation. Checkpoints can later be promoted to snapshots and trigger log
+truncation by emitting a `{release_cursor, RaftIndex}` effect. The most
+recent checkpoint with an index smaller than or equal to `RaftIndex` will be
+promoted.
diff --git a/src/ra_machine.erl b/src/ra_machine.erl
@@ -146,8 +146,13 @@
 %% forcing a GC run.
 %%
 %% Although both leaders and followers will process the same commands, effects
-%% are typically only applied on the leader. The only exception to this is
-%% the `release_cursor' and `garbage_collect' effects. The former is realised on all
+%% are typically only applied on the leader. The only exceptions to this are:
+%% <ul>
+%% <li>`release_cursor'</li>
+%% <li>`checkpoint'</li>
+%% <li>`garbage_collect'</li>
+%% </ul>
+%% The former two are realised on all
 %% nodes as it is a part of the Ra implementation log truncation mechanism.
 %% The `garbage_collect' effects that is used to explicitly triggering a GC run
 %% in the Ra servers' process.