Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port #4301 to serverless #4320

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 16 additions & 9 deletions docs/en/serverless/alerting/triage-slo-burn-rate-breaches.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,26 @@ You can follow the links to navigate to the source SLO or rule definition.

Explore charts on the page to learn more about the SLO breach:

![Alert details for SLO burn rate breach](../images/slo-burn-rate-breach.png)
* **Burn rate chart**. The first chart shows the burn rate during the time range when the alert was active.
The line indicates how close the SLO came to breaching the threshold.

* The first chart shows the burn rate during the time range when the alert was active.
The line indicates how close the SLO came to breaching the threshold.
* The next chart shows the alerts history over the last 30 days.
It shows the number of alerts that were triggered and the average time it took to recover after a breach.
* Both timelines are annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
![Alert details for SLO burn rate breach](../images/slo-burn-rate-breach.png)

<DocCallOut title="Tip">
The timeline is annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
</DocCallOut>

* **Alerts history chart**. The next chart provides information about alerts for the same rule and group over the last 30 days.
It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days,
and the average time it took to recover after a breach.

![Alert history chart in alert details for SLO burn rate breach](../images/log-threshold-breach-alert-history-chart.png)

The number, duration, and frequency of these breaches over time gives you an indication of how severely the service is degrading so that you can focus on high severity issues first.

<DocCallOut color="empty|warning|danger" title="Note">
The contents of the alert details page may vary depending on the type of SLI that's defined in the SLO.
<DocCallOut title="Note">
The contents of the alert details page may vary depending on the type of SLI that's defined in the SLO.
</DocCallOut>

After investigating the alert, you may want to:
Expand Down
41 changes: 25 additions & 16 deletions docs/en/serverless/alerting/triage-threshold-breaches.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,31 @@ You can follow the links to navigate to the rule definition.

Explore charts on the page to learn more about the threshold breach:

![Alert details for log threshold breach](../images/log-threshold-breach.png)


* The page includes a chart for each condition specified in the rule.
These charts help you understand when the breach occurred and its severity.
* If your rule is intended to detect log threshold breaches
(that is, it has a single condition that uses a count aggregation),
you can run a log rate analysis, assuming you have the required license.
Running a log rate analysis is useful for detecting significant dips or spikes in the number of logs.
Notice that you can adjust the baseline and deviation, and then run the analysis again.
For more information about using the log rate analysis feature,
refer to the [AIOps Labs](((kibana-ref))/xpack-ml-aiops.html#log-rate-analysis) documentation.
* The page may also include an alerts history chart that shows the number of triggered alerts per day for the last 30 days.
This chart is currently only available for rules that specify a single condition.
* Timelines on the page are annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
* **Charts for each condition**. The page includes a chart for each condition specified in the rule.
These charts help you understand when the breach occurred and its severity.

![Chart for a condition in alert details for log threshold breach](../images/log-threshold-breach-condition-chart.png)

<DocCallOut title="Tip">
The timeline is annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
</DocCallOut>

* **Log rate analysis chart**. If your rule is intended to detect log threshold breaches
(that is, it has a single condition that uses a count aggregation),
you can run a log rate analysis, assuming you have the required license.
Running a log rate analysis is useful for detecting significant dips or spikes in the number of logs.
Notice that you can adjust the baseline and deviation, and then run the analysis again.
For more information about using the log rate analysis feature,
refer to the [AIOps Labs](((kibana-ref))/xpack-ml-aiops.html#log-rate-analysis) documentation.

![Log rate analysis chart in alert details for log threshold breach](../images/log-threshold-breach-log-rate-analysis.png)

* **Alerts history chart**. The next chart provides information about alerts for the same rule and group over the last 30 days.
It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days,
and the average time it took to recover after a breach.

![Alert history chart in alert details for SLO burn rate breach](../images/log-threshold-breach-alert-history-chart.png)

Analyze these charts to better understand when the breach started, it's current
state, and how the issue is trending.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/serverless/images/slo-burn-rate-breach.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading