fix(cilium): add taint and install cilium first - WIP #275

t1murl · 2022-08-13T22:53:48Z

This is a WIP PR to fix #270 .

I've just had a few minutes to look at this so far so please consider this as me sharing my idea.

When I've had more time to test and finalize, I'll mark this ready for review.
However, please do not hesitate to edit if someone else finds time earlier than me 👍 .

I've added ToDo comments where I left off.

Edit: Done force pushing, sorry for that.

Signed-off-by: t1murl <[email protected]>

…to kustomizations Signed-off-by: t1murl <[email protected]>

mysticaltech · 2022-08-14T09:31:16Z

Very good start @t1murl, thank you :) Just would love you to keep the HelmChart and values logic, as it's ideal IMHO. If not, why?

t1murl · 2022-08-14T09:35:24Z

@mysticaltech , thanks! 👍
The k3s HelmChart controller spawns a pod to perform the helm chart installation. If we taint the node with the cilium NoExecute taint the aforementioned pod does not tolerate the taint and will be stuck pending (so far I have not found a way to add tolerations to the pod spawned by the controller), leading to the chart never being installed. Please note that with my current draft you do not loose the customization through values option.

t1murl · 2022-08-14T09:42:33Z

Also just noticed the k3s docs stating:

K3s agents can be configured with the options --node-label and --node-taint which adds a label and taint to the kubelet. The two options only add labels and/or taints at registration time, so they can only be added once and not changed after that again by running K3s commands.

It seems like the taint is not re-applied after a node reboot, potentially leaving the initial problem description unresolved by what I've done so far.

I'll look up what we may be able to do with e.g. kured, i.e. before it issues the reboot it should taint the node, to extend the current draft.

mysticaltech · 2022-08-14T10:01:30Z

Got it! We can execute arbitrary commands with Kured. So maybe a custom bash would help resolve the situation?!

mysticaltech · 2022-08-14T10:02:40Z

Or a startup script for Cilium nodes, MicroOS offers that kind of functionality easily.

t1murl · 2022-08-14T10:10:16Z

@mysticaltech , that also sounds promising! I am not really familiar with MicroOS.. I reckon you mean this? Any help on this is more than welcome :)

mysticaltech · 2022-08-14T10:28:57Z

@t1murl Combustion or Ignition are for first boot. Am thinking more in terms of https://www.suse.com/c/easy-running-scripts-boot-and-shutdown/

t1murl · 2022-08-14T10:30:54Z

FYI: Tainting the node while cilium is still running will result in cilium instantly deleting the taint again. I feel like we might want to taint the node when it is in a state where cilium is not running (anymore) and before pods start coming up (again).

t1murl · 2022-08-14T11:52:55Z

@t1murl Combustion or Ignition are for first boot. Am thinking more in terms of https://www.suse.com/c/easy-running-scripts-boot-and-shutdown/

@mysticaltech From quickly reading into this, this indeed sounds great to me for correctly timing the tainting! Very good idea in my opinion 🙂👍

I feel like we should try this. Only need to find a neat way to run e.g. kubectl taint node $(hostname) "node.cilium.io/agent-not-ready:NoExecute" or the respective API call. Currently there does not seem to be a kubeconfig ready to use on worker nodes.

mysticaltech · 2022-08-14T12:50:31Z

Perfect!! Yep, sounds good 😊

About kubeconfig, it is already being downloaded from the first control plane node, if I remember correctly in init.tf, so we could reupload that to worker nodes for instance!

mysticaltech · 2022-08-23T14:43:33Z

@t1murl I just came back from a break; I'll try to help with this ASAP.

mysticaltech · 2022-08-23T14:45:34Z

@PurpleBooth If you have some bandwidth, you know how much your contributions are appreciated! :) 🙏

PurpleBooth · 2022-08-24T06:09:48Z

Let me reflect back my understanding of the problem here.

What we're trying to avoid containers that are not managed by the Cillium i.e. they were launched before cillium was fully initialised? From what I understand Cillium uses two techniques for this, firstly you can use this taint, or secondly you can restart the pods. We probably don't want to be randomly restarting pods, so the taint is a nicer option.

However adding this taint means that the HelmController won't schedule the pod because the tolerations for bootstrap: true do not include the special taint. So we need a way to get the helm chart to run.

The approach here is to use the helm CLI rather than the controller. Other approaches might include, adding a pod to add the toleration to the controller, to use the cilium cli (which is basically the same at the helm chart), or we could render using helm template somewhere in this module

Dropped a question over at the HelmChart repository to see if they have any thoughts k3s-io/helm-controller#151 I find it hard to believe we are the only ones to have experienced this

mysticaltech · 2022-08-25T00:32:43Z

@PurpleBooth I did not realize that the Helm controller was in play even after Cilium is installed. Good catch!

@t1murl Does that make sense to you?

t1murl · 2022-08-25T15:46:33Z

@PurpleBooth I did not realize that the Helm controller was in play even after Cilium is installed. Good catch!

@t1murl Does that make sense to you?

@PurpleBooth , thanks for sharing your reflection on this problem and thanks for opening k3s-io/helm-controller#151! 👍
I feel this will help to understand if / how we can use the HelmController to install the Cilium chart instead of my proposal using plain helm + kubectl apply (or one of the other approaches you mentioned that avoids the HelmController) when tainting the nodes.
Please note that with the proposal in this MR, the initial deployment is working as expected (simply by installing Cilium before anything else). While the Cilium pods are starting, the HelmController pods (that do not tolerate the taint) for installing the other stuff patiently wait pending until the taint is removed by Cilium when it is ready - so no issues here.

We still have to implement something that accounts for node reboots after the initial deployment, i.e. the problem description I have provided in #270 where I've described how pods become unmanaged by Cilium after a node reboot by e.g. kured because they have been scheduled already and start faster than the Cilium pods when the nodes comes back up.

To account for this, my understanding of a potentially feasible solution would be to use @mysticaltech proposal mentioned in #275 (comment) to realize tainting the node again (required due to #275 (comment)) when a reboot occurs (also see #275 (comment) regarding the timing & #275 (comment)). Of course this only accounts for
'safe/regular' node reboots and not power loss or unexpected shutdowns that bypass @mysticaltech proposed solution of a shutdown script - which I would find acceptable.

cc: @mysticaltech

PS: Unfortunately even my weekends are busy currently (birthdays & weddings to attend) so more days will pass until I can personally try out @mysticaltech proposal. Happy to hear you are back @mysticaltech to assist with this! 👍

mysticaltech · 2022-08-25T21:36:52Z

@t1murl Thanks for clarifying! Helps a lots. As soon as any of us starts working on this, let's signal it here, so that we do not overlap on the same things. I'll do my best ASAP, if @PurpleBooth doesn't get this sorted before (she tends to be fast when she starts!) And looking forward for a reply to the issue she opened.

PurpleBooth · 2022-08-26T12:34:22Z

For restarts we could do use Kured https://github.com/weaveworks/kured#adding-node-labels-before-and-after-reboots

mysticaltech · 2022-08-26T13:33:08Z

Excellent find, @PurpleBooth; That's another obstacle out of the way!!

mysticaltech · 2022-08-26T13:38:06Z

I believe it's now worth tweaking this PR and giving it a try as is. Maybe the Cilium containers will start after all. And if they successfully remove the label, that's it unless I do not see things as clearly as you. It's also important to remember that not all nodes restart simultaneously, and in an HA setting, at least two CP nodes would already have been bootstrapped and have Cilium running on them. @PurpleBooth What do you think?

t1murl · 2022-08-26T14:53:44Z

@PurpleBooth , @mysticaltech to avoid confusion: node labels != node taints :O Afaik node labels are not the same thing as node taints.
Having kured apply a node label will not prevent pods starting before the Cilium pods are ready.
Ciliums answer for this situation is a node taint, as mentioned before (https://docs.cilium.io/en/v1.12/gettingstarted/taints/#taint-effects). A taint is what causes pods that do not tolerate it to not be scheduled, or in our case executed when tainting with :NoExecute as per the Cilium recommendation.

I've already considered what we could do with kured, however I've not found a way to have it add a taint in a way that would help us in the context of this issue out of the box. I've then thought about why not have it execute a local script to taint the node before it issues the reboot cmd (as also mentioned in #275 (comment)). I've found this not feasible, however, due to #275 (comment).

I hope this helps :)

mysticaltech · 2022-08-26T15:23:52Z

@t1murl My bad, I was confusing the two!

mysticaltech · 2022-08-27T08:32:29Z

@t1murl @PurpleBooth If the Cilium cli can solve our issues, then let's move away from the helm controller if needed. We can install it on the CP nodes at the moment when the k3s SELinux RPM is installed so that no additional reboots are needed.

mysticaltech · 2022-08-27T08:39:07Z

And I have verified how to run startup scripts; it's basically via systemd, as explained here https://www.golinuxcloud.com/run-script-at-startup-boot-without-cron-linux/.

As said above, the kubeconfig.yaml is already being downloaded to the root of the user's module, so we have it in Terraform as a variable; it just needs to be pushed to agent nodes that don't have it.

I also believe it's worth ignoring the tolerations of the Helm controller for now. Let's see if Cilium does not run if its taints are applied. If wich case, the wisest solution would be to move away from the Helm Controller towards the Cilium cli to install and run Cilium.

mysticaltech · 2022-08-27T09:09:27Z

Good news folks, there is a good chances that custom tolerations can be specified via the helm chart, in our case via the HelmChart definition. See this example for Longhorn, and other Rancher projects do the same. longhorn/longhorn#2664

mysticaltech · 2022-08-27T09:23:13Z

Last but not least, various tolerations that Cilium support, as also supporting by the helm controller jobs. https://github.com/k3s-io/helm-controller/search?q=toleration&type=commits

That said from what I gather custom tolerations in the helmchart yaml would go under "spec".

mysticaltech · 2022-08-27T16:19:15Z

@t1murl I will close this as I will redo most, to keep the current HelmChart method. But it's good inspiration, thank you!

mysticaltech · 2022-08-27T16:44:08Z

Changed my mind, merging instead so I have you in the git history @t1murl, it's very important to me as you've helped so much 🙏. I will compare from master and do the required changes.

t1murl force-pushed the master branch 4 times, most recently from e967687 to c053b3e Compare August 14, 2022 03:08

fix(cilium): add taint and install cilium first - WIP

ea97125

Signed-off-by: t1murl <[email protected]>

t1murl force-pushed the master branch from c053b3e to ea97125 Compare August 14, 2022 03:17

fix(cilium): remove indentation from cilium values and do not add it …

dd90ec3

…to kustomizations Signed-off-by: t1murl <[email protected]>

mysticaltech requested a review from PurpleBooth August 23, 2022 14:44

terraform-docs: automated action

1659f2f

Merge branch 'master' into master

b6b3682

mysticaltech closed this Aug 27, 2022

mysticaltech reopened this Aug 27, 2022

mysticaltech changed the base branch from master to staging August 27, 2022 16:44

mysticaltech marked this pull request as ready for review August 27, 2022 16:45

mysticaltech merged commit 57016ac into kube-hetzner:staging Aug 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cilium): add taint and install cilium first - WIP #275

fix(cilium): add taint and install cilium first - WIP #275

t1murl commented Aug 13, 2022 •

edited

Loading

mysticaltech commented Aug 14, 2022

t1murl commented Aug 14, 2022 •

edited

Loading

t1murl commented Aug 14, 2022 •

edited

Loading

mysticaltech commented Aug 14, 2022

mysticaltech commented Aug 14, 2022

t1murl commented Aug 14, 2022

mysticaltech commented Aug 14, 2022

t1murl commented Aug 14, 2022 •

edited

Loading

t1murl commented Aug 14, 2022

mysticaltech commented Aug 14, 2022

mysticaltech commented Aug 23, 2022

mysticaltech commented Aug 23, 2022

PurpleBooth commented Aug 24, 2022 •

edited

Loading

mysticaltech commented Aug 25, 2022

t1murl commented Aug 25, 2022 •

edited

Loading

mysticaltech commented Aug 25, 2022

PurpleBooth commented Aug 26, 2022

mysticaltech commented Aug 26, 2022

mysticaltech commented Aug 26, 2022 •

edited

Loading

t1murl commented Aug 26, 2022 •

edited

Loading

mysticaltech commented Aug 26, 2022

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022

mysticaltech commented Aug 27, 2022 •

edited

Loading

fix(cilium): add taint and install cilium first - WIP #275

fix(cilium): add taint and install cilium first - WIP #275

Conversation

t1murl commented Aug 13, 2022 • edited Loading

mysticaltech commented Aug 14, 2022

t1murl commented Aug 14, 2022 • edited Loading

t1murl commented Aug 14, 2022 • edited Loading

mysticaltech commented Aug 14, 2022

mysticaltech commented Aug 14, 2022

t1murl commented Aug 14, 2022

mysticaltech commented Aug 14, 2022

t1murl commented Aug 14, 2022 • edited Loading

t1murl commented Aug 14, 2022

mysticaltech commented Aug 14, 2022

mysticaltech commented Aug 23, 2022

mysticaltech commented Aug 23, 2022

PurpleBooth commented Aug 24, 2022 • edited Loading

mysticaltech commented Aug 25, 2022

t1murl commented Aug 25, 2022 • edited Loading

mysticaltech commented Aug 25, 2022

PurpleBooth commented Aug 26, 2022

mysticaltech commented Aug 26, 2022

mysticaltech commented Aug 26, 2022 • edited Loading

t1murl commented Aug 26, 2022 • edited Loading

mysticaltech commented Aug 26, 2022

mysticaltech commented Aug 27, 2022 • edited Loading

mysticaltech commented Aug 27, 2022 • edited Loading

mysticaltech commented Aug 27, 2022 • edited Loading

mysticaltech commented Aug 27, 2022 • edited Loading

mysticaltech commented Aug 27, 2022

mysticaltech commented Aug 27, 2022 • edited Loading

t1murl commented Aug 13, 2022 •

edited

Loading

t1murl commented Aug 14, 2022 •

edited

Loading

t1murl commented Aug 14, 2022 •

edited

Loading

t1murl commented Aug 14, 2022 •

edited

Loading

PurpleBooth commented Aug 24, 2022 •

edited

Loading

t1murl commented Aug 25, 2022 •

edited

Loading

mysticaltech commented Aug 26, 2022 •

edited

Loading

t1murl commented Aug 26, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading

mysticaltech commented Aug 27, 2022 •

edited

Loading