-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixing race condition during cluster upgrade with kernel upgrade (#1222)
The following scenario causes worker pods to be stuck during cluster upgrade: 1) kernel module is loaded into the node. NMC contains both spec and status using the current kernel version 2) cluster upgrade starts. As part of the upgrade node becomes Unschedulable 3) Module-NMC removes Spec from NMC, since the node is Uschedulable 4) Node becomes schedulable 5) NMC controller tries to unload kernel module using the NMC Status confiration, which contains old kernel. 6) Worker unload pods get stuck in Error, since Node is running the new kernel 7) Module-NMC updates Spec of NMC, but since worker pod exists, nothing is done Solution: When processing orphaned NMC statuses (status exists but spec does not), NMC controller should ignore modules that have statuses created prior to Node's Ready timestamp
- Loading branch information
1 parent
aed4d51
commit 898c278
Showing
3 changed files
with
93 additions
and
85 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.