Skip to content

Commit

Permalink
Delay machine upgrade until all Ra servers support it
Browse files Browse the repository at this point in the history
[Why]
Before this patch, a Ra cluster would switch to a new machine version
immediately after a leader with that version was elected.

Because a leader can be elected with a quorum number of candidate voting
for it, it means the cluster could start using the new machine version
as soon as a quorum of members support that version.

Unfortunately, other members that do not support it stop applying
commands because they run an older version of the machine code. For some
consumers of Ra, like Khepri, this means they could cease their
operation locally until the member is restarted with the new machine
version.

We want to delay the machine upgrade to a point where all members know
about the new version. This ensures all members can continue to provide
their service.

[How]
The machine version to use is communicated by the leader using the
`noop` command. This command is the first one sent just after an
election. The machine version passed was the local machine version.

With this patch, the `noop` command sent after an election passes the
effective machine version, except if the leader is unclustered alone (in
which case it passes the latest machine version. Therefore in a cluster,
the leader will send a second `noop` command with a newer machine
version later, once all members support it.

To determine what each follower supports, this patch introduces two
commands:
* `#info_rpc{}`
* `#info_reply{}`

Once a leader is elected, in addition to the `noop` command, it sends an
`#info_rpc{}` command to all followers. They reply with `#info_reply{}`
with the machine version they support. This mechanism is not specific to
machine upgrades: this could be extended in the future to communicate
more details about each follower.

Once the leader received the machine version of every followers, it can
determine the highest possible supported machine version. For that, it
simply takes the lowest reported machine version (including the leader's
machine version). If this version is greater than the effective machine
version, the leader sends a new `noop` command with the new machine
version to use.

The leader sends the `#info_rpc{}` command again and again to some
followers at each "tick", if these followers did not report anything
yet, or if the reported machine version is lower than its own supported
machine version. This takes care of follower that did not receive the
initial `#info_rpc{}` and those that were restarted as part of an
upgrade.

Fixes #490.

V2: Address comments from @kjnilsson:
    * Use an empty map by default in `#info_reply{}` instead of
      `undefined`. This simplifies the handling of the reply with a
      single `lists:foldl/3` instead of two.
    * Merge `has_enough_peer_info/1` into
      `get_max_supported_machine_version/1`.
    * Add a system-level option to restore the Ra 2.15 behavior.
  • Loading branch information
dumbbell committed Jan 9, 2025
1 parent f5ac36b commit e9c82a0
Show file tree
Hide file tree
Showing 6 changed files with 439 additions and 97 deletions.
14 changes: 13 additions & 1 deletion src/ra.hrl
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@
voter_status => ra_voter_status(),
%% indicates that a snapshot is being sent
%% to the peer
status := ra_peer_status()}.
status := ra_peer_status(),
machine_version => ra_machine:version()}.

-type ra_cluster() :: #{ra_server_id() => ra_peer_state()}.

Expand Down Expand Up @@ -187,6 +188,17 @@
{query_index :: integer(),
term :: ra_term()}).

-record(info_rpc,
{from :: ra_server_id(),
term :: ra_term(),
keys :: [ra_server:ra_server_info_key()]}).

-record(info_reply,
{from :: ra_server_id(),
term :: ra_term(),
keys :: [ra_server:ra_server_info_key()],
info = #{} :: ra_server:ra_server_info()}).

%% WAL defaults
-define(WAL_DEFAULT_MAX_SIZE_BYTES, 256 * 1000 * 1000).
-define(WAL_DEFAULT_MAX_BATCH_SIZE, 8192).
Expand Down
Loading

0 comments on commit e9c82a0

Please sign in to comment.