-
Notifications
You must be signed in to change notification settings - Fork 18
storageos pod is not evicted when kube node is NotReady #122
Comments
Hi, can you provide more details about your deployment? The cluster config file or a description of the storageoscluster resource would be very helpful. If the operator version you're running is from the master branch and not a stable release, this is the right behavior. In #110 we added pod priority class to the storageos pods to make them critical resource, resulting in no eviction. We need this because storage is a critical part of a cluster. |
sure,
|
my namespace is : storageos , made no changes so far in the files i got from master |
when the "node" removed form the cluster, it is shown in the "kubectl get pods" |
In that case, I don't think there's anything wrong or anything that can be done. It's the k8s scheduler that makes decisions about eviction based on various factors. Our concern is try to not get evicted as much as possible. We don't have control over the scheduler's decision and I don't see anything bad with it. Usually the k8s scheduler takes action when a node is not available after a minute or so. Maybe waiting for some time will get the pod evicted automatically. But again, it doesn't matters if it's running or gets evicted. StorageOS is running as a DaemonSet on all the other available nodes. |
Also, |
if i enable/put the node back to the cluster, the container does not start ( restart ) properly - that is what i saw yesterday. if i evict the pod manually ( force pod delete )and then add the node back, Deamonset restarts container, then container start successfully if i add / edit "tolerations:" for the pod :
with
right before i remove the "node", pod get evicted successfully, and then pod restarts successfully as well when the "node" added that is what i saw yesterday, if you want me to repeat, please let me know |
In that's what's happening we can add tolerations for all the resources, something like this in the pod spec:
That should improve the recovery delay for all the pods. |
yes, what file needs to modified ? or it is in operators source code ? |
To first test it, you can directly edit the daemonset resource and add the toleration in it. Thanks for highlighting this issue. |
i agree, since "node down" can be possible event that we need to heal, I think evicting the pod make scene and to my taste,- as soon as possible, then restarting it then the node is back |
I tried this and I'm afraid we can't do much when it comes to DaemonSets.
We can't set the toleration time. I guess we have to wait for the k8s scheduler to automatically detect and restart the DaemonSet pod. |
more info here :
|
Hello,
i see that
storageos-daemonset POD is not being "removed" form the NotReady node.
i waited for minuets
The text was updated successfully, but these errors were encountered: