Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control which nodes restart dockerd on MCR/Engine Upgrades via Launchpad #530

Open
james-nesbitt opened this issue Dec 9, 2024 · 2 comments

Comments

@james-nesbitt
Copy link
Collaborator

A useful enhancement for launchpad would be to add functionality so that during an MCR/Engine upgrade, users could have the ability to specify on which nodes a restart of dockerd would occur. Some uses cases of large clusters have their upgrades performed in batches of workers, so that pods can be shifted around to avoid impact. The idea is a user could specify the specific host(s) for restarts of dockerd to occur instead of it restarting dockerd on all hosts one by one in a linear fashion. This results in less/unnecessary impact and disruption during an upgrade.

As a suggestion, perhaps a "don’t restart” flag to launchpad, which would tell launchpad to not do anything during the “Restart MCR” phase. Thank you!

@abrainerd : this is migrated from the other repo

@ebourgeois
Copy link

I really like the idea of "don't restart any" and allow users to restart as they see fit.

@james-nesbitt
Copy link
Collaborator Author

There are two issues with preventative restarts:

  1. launchpad no longer causes the restarts, except for cases where there is a change in MCR daemon json for a host (we did have a hypothesis that launchpad is dectecting changes when there is none, but that needs to be verified) It is the packaging and process managements system (like systemd) which are restarting MCR now.
  2. if the MCR daemon, containerd or runc components are upgraded without any restarts then the system will be in an unpredictable state, which would cause unknown problems and perhaps confuse MKE.

We have a couple of options:

  1. allow a staged upgrade of workers, allowing a launchpad run to limit worker upgrade to certain nodes only (managers would still be upgraded when needed)
  2. try to trick systemd into not restarting the workers - unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants