Skip to content

Commit

Permalink
Document checkpointing
Browse files Browse the repository at this point in the history
  • Loading branch information
the-mikedavis committed Jan 26, 2024
1 parent 07c9b61 commit f518d16
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 2 deletions.
21 changes: 21 additions & 0 deletions docs/internals/INTERNALS.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,27 @@ It is not guaranteed that a snapshot will be taken. A decision to take
a snapshot or to delay it is taken using a number of internal Ra state factors.
The goal is to minimise disk I/O activity when possible.

### Checkpointing

Checkpoints are nearly the same concept as snapshots. Snapshotting truncates
the log up to the snapshot's index, which might be undesirable for machines
which read from the log with the `{log, Indexes, Fun}` effect mentioned above.

The `{checkpoint, RaftIndex, MachineState}` effect can be used as a hint to
trigger a checkpoint. Like snapshotting, this effect is evaluated on all nodes
and when a checkpoint is taken, the machine state is saved to disk and can be
used for recovery when the machine restarts. A checkpoint being written does
not trigger any log truncation though.

The `{release_cursor, RaftIndex}` effect can then be used to promote any
existing checkpoint older than or equal to `RaftIndex` into a proper snapshot,
and any log entries older than the checkpoint's index are then truncated.

These two effects are intended for machines that use the `{log, Indexes, Fun}`
effect and can substantially improve machine recovery time compared to
snapshotting alone, especially when the machine needs to keep old log entries
around for a long time.

## State Machine Versioning

It is eventually necessary to make changes to the state machine
Expand Down
8 changes: 8 additions & 0 deletions docs/internals/STATE_MACHINE_TUTORIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,3 +218,11 @@ or similar.
To (potentially) trigger a snapshot return the `{release_cursor, RaftIndex, MachineState}`
effect. This is why the raft index is included in the `apply/3` function. Ra will
only create a snapshot if doing so will result in log segments being deleted.

For machines that must keep log segments on disk for some time, the
`{checkpoint, RaftIndex, MachineState}` effect can be used. This creates a
snapshot-like view of the machine state on disk but doesn't trigger log
truncation. Checkpoints can later be promoted to snapshots and trigger log
truncation by emitting a `{release_cursor, RaftIndex}` effect. The most
recent checkpoint with an index smaller than or equal to `RaftIndex` will be
promoted.
9 changes: 7 additions & 2 deletions src/ra_machine.erl
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,13 @@
%% forcing a GC run.
%%
%% Although both leaders and followers will process the same commands, effects
%% are typically only applied on the leader. The only exception to this is
%% the `release_cursor' and `garbage_collect' effects. The former is realised on all
%% are typically only applied on the leader. The only exceptions to this are:
%% <ul>
%% <li>`release_cursor'</li>
%% <li>`checkpoint'</li>
%% <li>`garbage_collect'</li>
%% </ul>
%% The former two are realised on all
%% nodes as it is a part of the Ra implementation log truncation mechanism.
%% The `garbage_collect' effects that is used to explicitly triggering a GC run
%% in the Ra servers' process.
Expand Down

0 comments on commit f518d16

Please sign in to comment.