Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #4301) Update alert history chart screenshots #4322

Merged
merged 1 commit into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file modified docs/en/observability/images/slo-burn-rate-breach.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 14 additions & 6 deletions docs/en/observability/triage-slo-burn-rate-breaches.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,23 @@ You can follow the links to navigate to the source SLO or rule definition.

Explore charts on the page to learn more about the SLO breach:

* *Burn rate chart*. The first chart shows the burn rate during the time range when the alert was active.
The line indicates how close the SLO came to breaching the threshold.
+
[role="screenshot"]
image::images/slo-burn-rate-breach.png[Alert details for SLO burn rate breach]

* The first chart shows the burn rate during the time range when the alert was active.
The line indicates how close the SLO came to breaching the threshold.
* The next chart shows the alerts history over the last 30 days.
It shows the number of alerts that were triggered and the average time it took to recover after a breach.
* Both timelines are annotated to show when the threshold was breached.
+
[TIP]
====
The timeline is annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
====

* *Alerts history chart*. The next chart provides information about alerts for the same rule and group over the last 30 days.
It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days, and the average time it took to recover after a breach.
+
[role="screenshot"]
image::images/log-threshold-breach-alert-history-chart.png[Alert history chart in alert details for SLO burn rate breach]

The number, duration, and frequency of these breaches over time gives you an indication of how severely the service is degrading so that you can focus on high severity issues first.

Expand Down
28 changes: 20 additions & 8 deletions docs/en/observability/triage-threshold-breaches.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,34 @@ You can follow the links to navigate to the rule definition.

Explore charts on the page to learn more about the threshold breach:

* *Charts for each condition*. The page includes a chart for each condition specified in the rule.
These charts help you understand when the breach occurred and its severity.
+
[role="screenshot"]
image::images/log-threshold-breach.png[Alert details for log threshold breach]
image::images/log-threshold-breach-condition-chart.png[Chart for a condition in alert details for log threshold breach]
+
[TIP]
====
The timeline is annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
====

* The page includes a chart for each condition specified in the rule.
These charts help you understand when the breach occurred and its severity.
* If your rule is intended to detect log threshold breaches
* *Log rate analysis chart*. If your rule is intended to detect log threshold breaches
(that is, it has a single condition that uses a count aggregation),
you can run a log rate analysis, assuming you have the required license.
Running a log rate analysis is useful for detecting significant dips or spikes in the number of logs.
Notice that you can adjust the baseline and deviation, and then run the analysis again.
For more information about using the log rate analysis feature,
refer to the {kibana-ref}/xpack-ml-aiops.html#log-rate-analysis[AIOps Labs] documentation.
* The page may also include an alerts history chart that shows the number of triggered alerts per day for the last 30 days.
This chart is currently only available for rules that specify a single condition.
* Timelines on the page are annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
+
[role="screenshot"]
image::images/log-threshold-breach-log-rate-analysis.png[Log rate analysis chart in alert details for log threshold breach]

* *Alerts history chart*. The next chart provides information about alerts for the same rule and group over the last 30 days.
It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days, and the average time it took to recover after a breach.
+
[role="screenshot"]
image::images/log-threshold-breach-alert-history-chart.png[Alert history chart in alert details for log threshold breach]

Analyze these charts to better understand when the breach started, it's current
state, and how the issue is trending.
Expand Down