Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding api-response time alert #1557

Merged
merged 5 commits into from
Nov 6, 2024
Merged

Adding api-response time alert #1557

merged 5 commits into from
Nov 6, 2024

Conversation

jcrichlake
Copy link
Contributor

@jcrichlake jcrichlake commented Nov 6, 2024

API Response Time

This PR adds an alert if the response time for a given API is above 5 seconds

Issue

#1400

Copy link

github-actions bot commented Nov 6, 2024

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Configuration Details
Ensure that the threshold and frequency values are appropriately configured for the environment to avoid excessive alerts or missed critical issues.

operations/template/alert.tf Show resolved Hide resolved
operations/template/alert.tf Show resolved Hide resolved
operations/template/alert.tf Show resolved Hide resolved
Copy link

github-actions bot commented Nov 6, 2024

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Enhancement
Expand the notification coverage for the API response time alert

Consider adding more action groups or notification channels to the alert to ensure
that the right stakeholders are informed promptly when the alert triggers.

operations/template/alert.tf [538-539]

 action {
-  action_group_id = azurerm_monitor_action_group.notify_slack_email[count.index].id
+  action_group_id = [
+    azurerm_monitor_action_group.notify_slack_email[count.index].id,
+    azurerm_monitor_action_group.other_group.id
+  ]
 }
Suggestion importance[1-10]: 6

Why: Adding more action groups or notification channels can enhance the alert's effectiveness by ensuring broader and more appropriate stakeholder engagement. This suggestion is actionable and can significantly improve alert responsiveness.

6
Adjust the threshold value for the API response time alert to better match the performance expectations

Ensure that the threshold value for the 'HttpResponseTime' metric is appropriate for
your application's performance requirements. If 5 seconds is too lenient or too
strict, adjust it accordingly.

operations/template/alert.tf [534]

-threshold        = 5 # Value is in seconds
+threshold        = <appropriate_value> # Value is in seconds
Suggestion importance[1-10]: 5

Why: The suggestion to review and adjust the threshold value is valid as it directly impacts the sensitivity and effectiveness of the alert. However, the suggestion lacks specific guidance on determining the appropriate value, making it less actionable.

5
Best practice
Adjust the severity level of the alert to match organizational policies

Review the 'severity' level to ensure it aligns with your organization's alerting
policies. Severity 2 might not be appropriate depending on the criticality of the
API performance.

operations/template/alert.tf [526]

-severity            = 2
+severity            = <appropriate_severity_level>
Suggestion importance[1-10]: 5

Why: Reviewing and adjusting the severity level to align with organizational policies is a good practice to ensure that alerts are prioritized correctly. However, like other suggestions, it lacks specific actionable details.

5
Possible issue
Ensure the metric name and namespace are correctly set for effective monitoring

Verify that the 'metric_namespace' and 'metric_name' are correctly configured for
the resources you are monitoring. Incorrect values can lead to the alert not
functioning as expected.

operations/template/alert.tf [530-531]

-metric_name      = "HttpResponseTime"
-metric_namespace = "microsoft.web/sites"
+metric_name      = "<correct_metric_name>"
+metric_namespace = "<correct_metric_namespace>"
Suggestion importance[1-10]: 4

Why: Verifying the correctness of the metric name and namespace is crucial for the functionality of the alert. However, the suggestion does not provide a method to verify or specific issues to look for, making it less practical.

4

Copy link

sonarqubecloud bot commented Nov 6, 2024

metric_namespace = "microsoft.Web/sites"
aggregation = "Average"
operator = "GreaterThan"
threshold = 5 # Value is in seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda curious on why 5 seconds? If it's a good reason, adding a comment could be useful for future selves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a number I just picked. It's slightly on the high side but we had some iffy latency spikes in the last week well above that number. We can potentially tweak this value later.

@jcrichlake jcrichlake merged commit d87edbf into main Nov 6, 2024
21 checks passed
@jcrichlake jcrichlake deleted the story/1400-API-response branch November 6, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants