Kubernetes controller which watches for pods which have certain failure states and removes them.
Since the introduction of Graceful node shutdown,
pods which were running a node which was shutdown are left in the cluster with
a state of Terminated
.
This leads to issues such as prometheus/prometheus#10257, and can lead to alerts firing unnecessarily.
Use Kustomize, or deploy the docker image yourself:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: failed-pod-controller
resources:
- github.com/playerdata/failed-pod-controller?ref=main
Available Env Vars:
Name | Description |
---|---|
CONTROLLER_DRY_RUN |
If truthy, won't actually delete any pods |