Skip to content

cybozu-go/zombie-detector

Repository files navigation

GitHub release CI PkgGoDev Go Report Card

Zombie detector

Zombie detector is a tool to detect Kubernetes resources that remain for a long time after deletion request.

Background

In Kubernetes, resources sometimes remain undeleted after they got deletionTimestamp. It may occur from a failure to remove finalizers or a bug in Kubernetes operators. These remaining resources (we call them zombie resources) can cause various problems, so we want to detect them.

The zombie-detector is a short-lived program, which inspects a cluster and lists zombie resources. It sends metrics to a Pushgateway, and they can be scraped later. This allows us to check the status of zombie resources on a dashboard and make alerts.

Features

  • It detects resources that remain undeleted after a certain period with a deletionTimestamp.
  • Elapsed time from deletion request and metadata of resources are pushed into Pushgateway.
  • We can use this both inside and outside cluster.

Build

CLI

go build

Docker Image

make docker-build

Usage

Usage:
  zombie-detector [flags]

Flags:
  -h, --help                 help for zombie-detector
      --pushgateway string   URL of Pushgateway's endpoint. If this flag is not given, the result outputs to stdout
      --threshold duration   threshold of detection (default 24h0m0s)
  -v, --version              version for zombie-detector

example

zombie-detector --incluster=false --pushgateway=<YOUR PUSHGATEWAY ADDRESS> --threshold=24h30m

Example manifest

We can run zombie-detector periodically as a CronJob in a Kubernetes cluster.

These are example manifests.

cronjob.yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: zombie-detector-cronjob
  namespace: zombie-detector
spec:
  schedule: "0 0 */1 * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: zombie-detector-sa
          containers:
          - name: zombie-detector
            image: zombie-detector:dev
            command:
            - ./zombie-detector
            - --threshold=24h
            - --pushgateway=http://<YOUR PUSHGATEWAY SERVICE ADRESS>.monitoring.svc.cluster.local:9091
          restartPolicy: OnFailure

rbac.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: zombie-detector-sa
  namespace: zombie-detector
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: null
  name: zombie-detector-role
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - '*'
  resources:
  - '*/*'
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: zombie-detector-rolebinding
subjects:
  - kind: ServiceAccount
    name: zombie-detector-sa
    namespace: zombie-detector
roleRef:
  kind: ClusterRole
  name: zombie-detector-role
  apiGroup: rbac.authorization.k8s.io