Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Replication Docs #1055

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions website/docs/cluster/replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,10 @@ In addition, newly configured replicas, added to the cluster, could face longer
The cluster operator can choose between various replication options to achieve a trade-off between performance and durability.
A summary of these options is shown below:

- Main Memory Replication (MMR)
- Fast AOF Truncation (FAT)
This option forces the primary to aggressively truncate the AOF so it does not spill into disk. It can be used in combination with aof-memory option which determines the maximum AOF memory buffer size.
When a replica attaches to a primary with MMR turned on, the AOF is not guaranteed to be truncated which may result in writes being lost.
To overcome this issue MMR should be used with ODC.
When a replica attaches to a primary with FAT turned on, the AOF is not guaranteed to be truncated which may result in writes being lost.
To overcome this issue FAT should be used with ODC.
- On Demand Checkpoint (ODC)
This option forces the primary to take a checkpoint if no checkpoint is available when replica tries to attach and recover. If a checkpoint becomes or was availalbe and the CCRO has not been truncated, then
the primary will lock it to prevent truncation while a replica is recovering. In this case, they AOF log could spill to disk as the AOF in memory buffer becomes full.
Expand Down Expand Up @@ -226,6 +226,25 @@ replica_announced:1
192.168.1.26:7001>
```

# Diskless Replication

When AOF gets truncated, full synchronization requires taking a checkpoint and sending that checkpoint over to the attaching replica.
This operation can be expensive because it involves multiple I/O operations at the primary and replica.
For this reason, we added a variant of full synchronization called diskless replication.
This is implemented using a streaming checkpoint that allows clients to continue issuing read and writes at the primary while attaching replicas synchronize.
To enable diskless replication the server needs to be started with the following flags

--repl-diskless-sync=true
This is used to enable diskless replication

--repl-diskless-sync-delay=\<seconds\>.
This is used to determine how many seconds to wait before starting the full sync, in order to give the opportunity to multiple replicas to attach and receive the streaming checkpoint.

There is no additional requirements to that of using the aforementioned flags in order to leverage diskless replication.
The APIs for mapping replicas remains the same (i.e. CLUSTER REPLICATE, REPLICAOF etc.).

Note that streaming replication does not take a checkpoint thus the AOF is not automatically truncated (unless FAT flag is sued) every time a full sync is performed.
This happens to ensure durability in the event of a failure which will not be possible if the AOF gets truncated without a persitent checkpoint.
However, the store version gets incremented to ensure consistency accross different instances that may be fully synced at different times.
Users can still utilize SAVE/BGSAVE commands to take a manual checkpoint which safely truncates the AOF.