Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4XX and 5XX Alert Aggregation to Total Instead of Count #1486

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

somesylvie
Copy link
Contributor

@somesylvie somesylvie commented Oct 24, 2024

Description

Based on @saquino0827's finding, updated the 4XX and 5XX alerts to use Total (Sum) instead of Count for their aggregation.

Issue

#1396 and #1394.

Checklist

  • Tested 4XX in the Internal environment. Specifically, made sure that the 4XXs come in within a specific minute so that it would only work with Total (Sum) instead of also working with Count.

Co-Authored-By: Bella L. Quintero <[email protected]>
Co-Authored-By: Samuel Aquino <[email protected]>
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Metric Aggregation Change
The change from 'Count' to 'Total' aggregation for HTTP error metrics may affect alert sensitivity and frequency. This should be validated to ensure it meets operational requirements.

Metric Aggregation Change
Similar to the 4XX alert, changing the aggregation for 5XX errors from 'Count' to 'Total' could impact how often alerts are triggered. This needs careful review to ensure it aligns with the intended alerting strategy.

@@ -134,7 +134,7 @@ resource "azurerm_monitor_metric_alert" "azure_4XX_alert" {
criteria {
metric_namespace = "Microsoft.Web/sites"
metric_name = "Http4xx"
aggregation = "Count"
aggregation = "Total"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment explaining the reason for changing the aggregation from 'Count' to 'Total'. This will help maintain the code and understand the decision-making process behind this change. [important]

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Possible issue
Verify and ensure compatibility of the aggregation method with the specified metrics

Ensure that the aggregation method "Total" is compatible with the metric "Http4xx"
and "Http5xx" for Azure monitoring. If "Total" is not a supported aggregation method
for these metrics, consider using an alternative method that is supported.

operations/template/alert.tf [137]

-aggregation      = "Total"
+aggregation      = "SupportedMethod"
Suggestion importance[1-10]: 7

Why: This suggestion is crucial as it ensures that the new aggregation method "Total" is compatible with the metrics "Http4xx" and "Http5xx". If incompatible, it could lead to incorrect alert behavior, making this a significant concern for the functionality of the monitoring setup.

7
Enhancement
Adjust threshold values to align with the new aggregation method

Review the threshold values for the alerts with the new "Total" aggregation method
to ensure they are still appropriate. The change from "Count" to "Total" might
require adjustments to the threshold values to maintain the alert's effectiveness.

operations/template/alert.tf [139]

-threshold        = 3
+threshold        = NewAppropriateValue
Suggestion importance[1-10]: 6

Why: Adjusting the threshold values is important to maintain the effectiveness of alerts after changing the aggregation method from "Count" to "Total". This suggestion is relevant as it directly impacts the alert's sensitivity and accuracy.

6
Performance
Review and adjust the frequency and window size settings if needed

Confirm that the alert frequency and window size settings are still optimal with the
change to "Total" aggregation. The aggregation change might affect the sensitivity
and timing of alerts.

operations/template/alert.tf [131-132]

-frequency           = "PT1M"
-window_size         = "PT1H"
+frequency           = "AdjustedFrequency"
+window_size         = "AdjustedWindowSize"
Suggestion importance[1-10]: 5

Why: This suggestion is relevant as changes in aggregation might affect how alert conditions are evaluated over time. Reviewing and possibly adjusting the frequency and window size can help in maintaining the alert system's performance and accuracy.

5
Maintainability
Review and update all configurations linked to the changed aggregation method

Ensure that all dependent configurations and alerts that might be affected by the
change from "Count" to "Total" are reviewed and updated accordingly. This includes
any linked actions or notifications based on these alerts.

operations/template/alert.tf [137]

-aggregation      = "Total"
+aggregation      = "Total" // Ensure compatibility with linked configurations
Suggestion importance[1-10]: 5

Why: Ensuring that all linked configurations and alerts are compatible with the change from "Count" to "Total" is important for system integrity and maintainability. This suggestion promotes a comprehensive review to avoid potential issues in linked alert actions or notifications.

5

Copy link

@halprin halprin changed the title switch metric aggregation to total instead of count 4XX and 5XX Alert Aggregation to Total Instead of Count Oct 24, 2024
@halprin halprin marked this pull request as ready for review October 24, 2024 19:56
@halprin halprin merged commit 1440ce6 into main Oct 24, 2024
20 checks passed
@halprin halprin deleted the switch-metric-aggregation-to-total-instead-of-count branch October 24, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants