Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad #530

james-nesbitt · 2024-12-09T15:16:28Z

A useful enhancement for launchpad would be to add functionality so that during an MCR/Engine upgrade, users could have the ability to specify on which nodes a restart of dockerd would occur. Some uses cases of large clusters have their upgrades performed in batches of workers, so that pods can be shifted around to avoid impact. The idea is a user could specify the specific host(s) for restarts of dockerd to occur instead of it restarting dockerd on all hosts one by one in a linear fashion. This results in less/unnecessary impact and disruption during an upgrade.

As a suggestion, perhaps a "don’t restart” flag to launchpad, which would tell launchpad to not do anything during the “Restart MCR” phase. Thank you!

@abrainerd : this is migrated from the other repo

ebourgeois · 2024-12-09T16:10:50Z

I really like the idea of "don't restart any" and allow users to restart as they see fit.

james-nesbitt · 2025-01-16T11:20:21Z

There are two issues with preventative restarts:

launchpad no longer causes the restarts, except for cases where there is a change in MCR daemon json for a host (we did have a hypothesis that launchpad is dectecting changes when there is none, but that needs to be verified) It is the packaging and process managements system (like systemd) which are restarting MCR now.
if the MCR daemon, containerd or runc components are upgraded without any restarts then the system will be in an unpredictable state, which would cause unknown problems and perhaps confuse MKE.

We have a couple of options:

allow a staged upgrade of workers, allowing a launchpad run to limit worker upgrade to certain nodes only (managers would still be upgraded when needed)
try to trick systemd into not restarting the workers - unknown

james-nesbitt mentioned this issue Dec 9, 2024

Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad Mirantis/launchpad_legacy#101

Closed

This was referenced Jan 16, 2025

Drain worker nodes before upgrading MCR #354

Open

PRODENG-2826 MCR Uninstall now swarm drains and prunes volumes #535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad #530

Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad #530

james-nesbitt commented Dec 9, 2024

ebourgeois commented Dec 9, 2024

james-nesbitt commented Jan 16, 2025

Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad #530

Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad #530

Comments

james-nesbitt commented Dec 9, 2024

ebourgeois commented Dec 9, 2024

james-nesbitt commented Jan 16, 2025