Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay machine upgrade until all members are up-to-date #493

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

dumbbell
Copy link
Member

@dumbbell dumbbell commented Dec 24, 2024

Why

Before this patch, a Ra cluster would switch to a new machine version immediately after a leader with that version was elected.

Because a leader can be elected with a quorum number of candidate voting for it, it means the cluster could start using the new machine version as soon as a quorum of members support that version.

Unfortunately, other members that do not support it stop applying commands because they run an older version of the machine code. For some consumers of Ra, like Khepri, this means they could cease their operation locally until the member is restarted with the new machine version.

We want to delay the machine upgrade to a point where all members know about the new version. This ensures all members can continue to provide their service.

How

The machine version to use is communicated by the leader using the noop command. This command is the first one sent just after an election. The machine version passed was the local machine version.

With this patch, the noop command sent after an election passes the effective machine version, except if the leader is unclustered alone (in which case it passes the latest machine version. Therefore in a cluster, the leader will send a second noop command with a newer machine version later, once all members support it.

To determine what each follower supports, this patch introduces two commands:

  • #info_rpc{}
  • #info_reply{}

Once a leader is elected, in addition to the noop command, it sends an #info_rpc{} command to all followers. They reply with #info_reply{} with the machine version they support. This mechanism is not specific to machine upgrades: this could be extended in the future to communicate more details about each follower.

Once the leader received the machine version of every followers, it can determine the highest possible supported machine version. For that, it simply takes the lowest reported machine version (including the leader's machine version). If this version is greater than the effective machine version, the leader sends a new noop command with the new machine version to use.

The leader sends the #info_rpc{} command again and again to some followers at each "tick", if these followers did not report anything yet, or if the reported machine version is lower than its own supported machine version. This takes care of follower that did not receive the initial #info_rpc{} and those that were restarted as part of an upgrade.

Fixes #490.

@dumbbell dumbbell requested a review from kjnilsson December 24, 2024 12:21
@dumbbell dumbbell self-assigned this Dec 24, 2024
@dumbbell dumbbell force-pushed the delay-machine-upgrade branch 4 times, most recently from 2a179b8 to 267aa04 Compare December 26, 2024 10:39
@dumbbell dumbbell added this to the 2.16.0 milestone Dec 26, 2024
@dumbbell dumbbell force-pushed the delay-machine-upgrade branch from 267aa04 to 602ffda Compare December 26, 2024 17:23
@dumbbell dumbbell marked this pull request as ready for review December 26, 2024 17:40
Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, a few tweaks and a question as to whether we should optionally keep the old behaviour or not.

src/ra.hrl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
@dumbbell dumbbell force-pushed the delay-machine-upgrade branch from 602ffda to 782d1ab Compare January 6, 2025 10:32
@dumbbell dumbbell marked this pull request as draft January 6, 2025 10:33
@dumbbell dumbbell requested a review from kjnilsson January 6, 2025 10:34
@dumbbell dumbbell force-pushed the delay-machine-upgrade branch 2 times, most recently from 1057a7d to 6b69268 Compare January 7, 2025 11:14
src/ra_server.erl Outdated Show resolved Hide resolved
[Why]
Before this patch, a Ra cluster would switch to a new machine version
immediately after a leader with that version was elected.

Because a leader can be elected with a quorum number of candidate voting
for it, it means the cluster could start using the new machine version
as soon as a quorum of members support that version.

Unfortunately, other members that do not support it stop applying
commands because they run an older version of the machine code. For some
consumers of Ra, like Khepri, this means they could cease their
operation locally until the member is restarted with the new machine
version.

We want to delay the machine upgrade to a point where all members know
about the new version. This ensures all members can continue to provide
their service.

[How]
The machine version to use is communicated by the leader using the
`noop` command. This command is the first one sent just after an
election. The machine version passed was the local machine version.

With this patch, the `noop` command sent after an election passes the
effective machine version, except if the leader is unclustered alone (in
which case it passes the latest machine version. Therefore in a cluster,
the leader will send a second `noop` command with a newer machine
version later, once all members support it.

To determine what each follower supports, this patch introduces two
commands:
* `#info_rpc{}`
* `#info_reply{}`

Once a leader is elected, in addition to the `noop` command, it sends an
`#info_rpc{}` command to all followers. They reply with `#info_reply{}`
with the machine version they support. This mechanism is not specific to
machine upgrades: this could be extended in the future to communicate
more details about each follower.

Once the leader received the machine version of every followers, it can
determine the highest possible supported machine version. For that, it
simply takes the lowest reported machine version (including the leader's
machine version). If this version is greater than the effective machine
version, the leader sends a new `noop` command with the new machine
version to use.

The leader sends the `#info_rpc{}` command again and again to some
followers at each "tick", if these followers did not report anything
yet, or if the reported machine version is lower than its own supported
machine version. This takes care of follower that did not receive the
initial `#info_rpc{}` and those that were restarted as part of an
upgrade.

Fixes #490.

V2: Address comments from @kjnilsson:
    * Use an empty map by default in `#info_reply{}` instead of
      `undefined`. This simplifies the handling of the reply with a
      single `lists:foldl/3` instead of two.
    * Merge `has_enough_peer_info/1` into
      `get_max_supported_machine_version/1`.
    * Add a system-level option to restore the Ra 2.15 behavior.
@dumbbell dumbbell force-pushed the delay-machine-upgrade branch from 6b69268 to e9c82a0 Compare January 9, 2025 10:14
@dumbbell dumbbell marked this pull request as ready for review January 9, 2025 10:14
@dumbbell dumbbell requested a review from kjnilsson January 9, 2025 10:14
@kjnilsson kjnilsson merged commit 7a6b85d into main Jan 9, 2025
6 checks passed
@dumbbell dumbbell deleted the delay-machine-upgrade branch January 9, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delay machine version upgrade until all members have the new version
2 participants