Skip to content

Commit

Permalink
docs wip
Browse files Browse the repository at this point in the history
  • Loading branch information
kjnilsson committed Jan 9, 2025
1 parent f9c3c98 commit 08610a0
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,4 @@ doc/*
/bazel-*

/.vscode/
.DS_store
89 changes: 89 additions & 0 deletions docs/internals/COMPACTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Ra log compaction

This is a living document capturing current work on log compaction.

## Overview


Compaction in Ra is intrinsically linked to the snapshotting
feature. Standard Raft snapshotting removes all entries in the Ra log
that precedes the snapshot where the snapshot is a full representation of
the state machine state.


### Ra Server log worker responsibilities

* Write checkpoints and snapshots
* Perform compaction runs
* report segments to be deleted back to the ra server (NB: the worker does
not perform the segment deletion itself, it needs to report changes back to the
ra server first). The ra server log worker maintains its own list of segments
to avoid double processing


```mermaid
sequenceDiagram
participant segment-writer
participant ra-server
participant ra-server-log
segment-writer--)ra-server: new segments
ra-server-)+ra-server-log: new segments
ra-server-log->>ra-server-log: phase 1 compaction
ra-server-log-)-ra-server: segment changes (new, to be deleted)
ra-server-)+ra-server-log: new snapshot
ra-server-log->>ra-server-log: write snapshot
ra-server-log->>ra-server-log: phase 1 compaction
ra-server-log-)-ra-server: snapshot written, segment changes
```

### Log sections

#### Normal log section

The normal log section is the contiguous log that follows the last snapshot.

#### Compacting log section

The compacting log section consists of all live raft indexes that are lower
than or equal to the last snapshot taken.

![compaction](compaction1.jpg)

### Compacted segments: naming (phase 3 compaction)

Segment files in a Ra log have numeric names incremented as they are written.
This is essential as the order is required to ensure log integrity.

Desired Properties of phase 3 compaction:

* Retain immutability, entries will never be deleted from a segment. Instead they
will be written to a new segment.
* lexicographic sorting of file names needs to be consistent with order of writes
* Compaction walks from the old segment to new
* Easy to recover after unclean shutdown

Segments will be compacted when 2 or more adjacent segments fit into a single
segment.

The new segment will have the naming format `OLD-NEW.segment`

This means that a single segment can only be compacted once e.g
`001.segment -> 001-001.segment` as after this there is no new name available
and it has to wait until it can be compacted with the adjacent segment. Single
segment compaction could be optional and only triggered when a substantial,
say 75% or more entries / data can be deleted.

This naming format means it is easy to identify dead segments after an unclean
exit.

During compaction a different extension will be used: `002-004.compacting` and
after an unclean shutdown any such files will be removed. Once synced it will be
renamed to `.segment` and some time after the source files will be deleted (Once
the Ra server has updated its list of segments).


![segments](compaction2.jpg)



Binary file added docs/internals/compaction1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/internals/compaction2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 08610a0

Please sign in to comment.