diff --git a/docs/en/observability/images/log-threshold-breach-alert-history-chart.png b/docs/en/observability/images/log-threshold-breach-alert-history-chart.png new file mode 100644 index 0000000000..bc9f274d8b Binary files /dev/null and b/docs/en/observability/images/log-threshold-breach-alert-history-chart.png differ diff --git a/docs/en/observability/images/log-threshold-breach-condition-chart.png b/docs/en/observability/images/log-threshold-breach-condition-chart.png new file mode 100644 index 0000000000..ff4c268493 Binary files /dev/null and b/docs/en/observability/images/log-threshold-breach-condition-chart.png differ diff --git a/docs/en/observability/images/log-threshold-breach-log-rate-analysis.png b/docs/en/observability/images/log-threshold-breach-log-rate-analysis.png new file mode 100644 index 0000000000..2854f11f19 Binary files /dev/null and b/docs/en/observability/images/log-threshold-breach-log-rate-analysis.png differ diff --git a/docs/en/observability/images/log-threshold-breach.png b/docs/en/observability/images/log-threshold-breach.png deleted file mode 100644 index 200ddfb875..0000000000 Binary files a/docs/en/observability/images/log-threshold-breach.png and /dev/null differ diff --git a/docs/en/observability/images/slo-burn-rate-breach.png b/docs/en/observability/images/slo-burn-rate-breach.png index cdedd2d722..e8332751aa 100644 Binary files a/docs/en/observability/images/slo-burn-rate-breach.png and b/docs/en/observability/images/slo-burn-rate-breach.png differ diff --git a/docs/en/observability/triage-slo-burn-rate-breaches.asciidoc b/docs/en/observability/triage-slo-burn-rate-breaches.asciidoc index b0748e8b62..2fd77eb6e8 100644 --- a/docs/en/observability/triage-slo-burn-rate-breaches.asciidoc +++ b/docs/en/observability/triage-slo-burn-rate-breaches.asciidoc @@ -18,15 +18,23 @@ You can follow the links to navigate to the source SLO or rule definition. Explore charts on the page to learn more about the SLO breach: +* *Burn rate chart*. The first chart shows the burn rate during the time range when the alert was active. +The line indicates how close the SLO came to breaching the threshold. ++ [role="screenshot"] image::images/slo-burn-rate-breach.png[Alert details for SLO burn rate breach] - -* The first chart shows the burn rate during the time range when the alert was active. -The line indicates how close the SLO came to breaching the threshold. -* The next chart shows the alerts history over the last 30 days. -It shows the number of alerts that were triggered and the average time it took to recover after a breach. -* Both timelines are annotated to show when the threshold was breached. ++ +[TIP] +==== +The timeline is annotated to show when the threshold was breached. You can hover over an alert icon to see the timestamp of the alert. +==== + +* *Alerts history chart*. The next chart provides information about alerts for the same rule and group over the last 30 days. +It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days, and the average time it took to recover after a breach. ++ +[role="screenshot"] +image::images/log-threshold-breach-alert-history-chart.png[Alert history chart in alert details for SLO burn rate breach] The number, duration, and frequency of these breaches over time gives you an indication of how severely the service is degrading so that you can focus on high severity issues first. diff --git a/docs/en/observability/triage-threshold-breaches.asciidoc b/docs/en/observability/triage-threshold-breaches.asciidoc index 9cc1341065..83a56c6dc9 100644 --- a/docs/en/observability/triage-threshold-breaches.asciidoc +++ b/docs/en/observability/triage-threshold-breaches.asciidoc @@ -19,22 +19,34 @@ You can follow the links to navigate to the rule definition. Explore charts on the page to learn more about the threshold breach: +* *Charts for each condition*. The page includes a chart for each condition specified in the rule. +These charts help you understand when the breach occurred and its severity. ++ [role="screenshot"] -image::images/log-threshold-breach.png[Alert details for log threshold breach] +image::images/log-threshold-breach-condition-chart.png[Chart for a condition in alert details for log threshold breach] ++ +[TIP] +==== +The timeline is annotated to show when the threshold was breached. +You can hover over an alert icon to see the timestamp of the alert. +==== -* The page includes a chart for each condition specified in the rule. -These charts help you understand when the breach occurred and its severity. -* If your rule is intended to detect log threshold breaches +* *Log rate analysis chart*. If your rule is intended to detect log threshold breaches (that is, it has a single condition that uses a count aggregation), you can run a log rate analysis, assuming you have the required license. Running a log rate analysis is useful for detecting significant dips or spikes in the number of logs. Notice that you can adjust the baseline and deviation, and then run the analysis again. For more information about using the log rate analysis feature, refer to the {kibana-ref}/xpack-ml-aiops.html#log-rate-analysis[AIOps Labs] documentation. -* The page may also include an alerts history chart that shows the number of triggered alerts per day for the last 30 days. -This chart is currently only available for rules that specify a single condition. -* Timelines on the page are annotated to show when the threshold was breached. -You can hover over an alert icon to see the timestamp of the alert. ++ +[role="screenshot"] +image::images/log-threshold-breach-log-rate-analysis.png[Log rate analysis chart in alert details for log threshold breach] + +* *Alerts history chart*. The next chart provides information about alerts for the same rule and group over the last 30 days. +It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days, and the average time it took to recover after a breach. ++ +[role="screenshot"] +image::images/log-threshold-breach-alert-history-chart.png[Alert history chart in alert details for log threshold breach] Analyze these charts to better understand when the breach started, it's current state, and how the issue is trending.