Skip to content

Commit

Permalink
Add runbook for PrometheusOperatorReconcileError alert
Browse files Browse the repository at this point in the history
  • Loading branch information
poornima-krishnasamy committed Jun 17, 2024
1 parent 0cbf19d commit a69fa34
Showing 1 changed file with 29 additions and 0 deletions.
29 changes: 29 additions & 0 deletions runbooks/source/prometheus-reconcile-alert.html.md.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: How to Investigate PrometheusOperatorReconcile Errors
weight: 218
last_reviewed_on: 2024-06-17
review_in: 6 months
---

# <%= current_page.data.title %>

When you see a `PrometheusOperatorReconcile` alert in the `low-priority-alerts` channel, it means that the Prometheus Operator is unable to reconcile the state of the Prometheus resources in the cluster.
This means some of the prometheus rules or alerts are having issues and has not applied fine.

## Troubleshooting

Check the logs of the Prometheus Operator pod to see if there are any errors:

```bash
kubectl logs -n monitoring prometheus-operator-kube-p-operator-<pod-id> -f
```

If you see any error like below:
```
level=info ts=2024-02-23T10:31:29.0543824Z caller=rules.go:345 component=prometheusoperator msg="Invalid rule" err="group \"XXX-elasticache\", rule 1, \"elasticache-enginecpu-utilisation\": annotation \"message\": template: __alert_elasticache-enginecpu-utilisation:1: undefined variable \"$clusterId\""
```

This could stops Prometheus from sending out alerts to certain channels and stops changes/new ones being created. You may also see an alert "PrometheusErrorSendingAlertsToSomeAlertmanagers" if that was the case.

You will need to fix the erroring PrometheusRule. If the rule is not configured in [cloud-platform-environments](https://github.com/ministryofjustice/cloud-platform-environments) repository,
find the namespace that rule is applied and get the team slack-channel or the last person who made a change and inform them to fix the rule.

0 comments on commit a69fa34

Please sign in to comment.