Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alert when instance count is low #1472

Merged
merged 19 commits into from
Oct 23, 2024
Merged

Conversation

somesylvie
Copy link
Contributor

@somesylvie somesylvie commented Oct 22, 2024

Description

Alert when Azure instance count is low, fix logic for identifying non-PR environments

Issue

#1398

Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: halprin <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Configuration Logic
The threshold logic for the alert based on environment levels needs clarification. The current setup might not be optimal for different scaling scenarios.

Commented Code
There's a commented line regarding the frequency and window size of the alert. This should be resolved or removed before merging.

operator = "LessThanOrEqual"
// This threshold is based on the autoscale settings in app.tf
// How should we tune these numbers if we've scaled up higher than the initial count of 3/1?
threshold = local.higher_environment_level ? 2.5 : 0.5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a variable for the threshold values to make them configurable outside of the code, enhancing flexibility and maintainability. [important]

description = "Action will be triggered when the instance count is too low"
frequency = "PT1M" // Checks every 1 minute
window_size = "PT15M" // Every Check, looks back 15 minutes in history
//TBD: How frequent do we want this alert and how far do we want it to look back.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or address the commented line about frequency and window size to avoid confusion and ensure clarity in the alert's behavior. [important]


lifecycle {
# Ignore changes to tags because the CDC sets these automagically
ignore_changes = [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the 'ignore_changes' lifecycle configuration is necessary for all these tags, as it might lead to overlooking important changes. [medium]

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Enhancement
Make threshold values configurable by using a variable

Consider using a variable for the threshold values to make them configurable outside
the code, enhancing flexibility and maintainability.

operations/template/alert.tf [140]

-threshold = local.higher_environment_level ? 2.5 : 0.5
+threshold = var.instance_count_threshold
Suggestion importance[1-10]: 7

Why: Using a variable for threshold values enhances flexibility and maintainability, allowing for easier adjustments without modifying the code directly.

7
Use variables for frequency and window size to increase configuration flexibility

Replace the hardcoded frequency and window size with variables to allow easy
adjustments without modifying the code directly.

operations/template/alert.tf [129-130]

-frequency           = "PT1M"  // Checks every 1 minute
-window_size         = "PT15M" // Every Check, looks back 15 minutes in history
+frequency           = var.alert_frequency
+window_size         = var.alert_window_size
Suggestion importance[1-10]: 7

Why: Replacing hardcoded values with variables for frequency and window size allows for easier configuration adjustments and improves the code's adaptability.

7
Possible issue
Review the necessity of ignoring changes to all specified tags

Ensure that the ignore_changes lifecycle configuration is necessary for all these
tags, as it might lead to overlooking important changes.

operations/template/alert.tf [149-161]

+ignore_changes = [
+  tags["business_steward"],
+  ...
+  tags["zone"]
+]
 
-
Suggestion importance[1-10]: 5

Why: This suggestion is valid as it prompts a review of the lifecycle configuration, which could prevent overlooking important changes. However, it is less critical than functional code changes.

5
Maintainability
Address or remove the 'TBD' comment to finalize the alert configuration

Clarify or remove the 'TBD' comment regarding the frequency and window size of the
alert to ensure the code is production-ready.

operations/template/alert.tf [131]

-//TBD: How frequent do we want this alert and how far do we want it to look back.
+// Frequency and window size are set based on operational requirements.
Suggestion importance[1-10]: 4

Why: Addressing or clarifying the 'TBD' comment is important for ensuring the code is production-ready, though it's a minor maintainability improvement compared to functional code changes.

4

Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: halprin <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: halprin <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Co-Authored-By: jherrflexion <[email protected]>
Co-Authored-By: Samuel Aquino <[email protected]>
Co-Authored-By: James Gilmore <[email protected]>
Co-Authored-By: jherrflexion <[email protected]>
Co-Authored-By: halprin <[email protected]>
Copy link

Copy link
Member

@halprin halprin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approved. Looks right.

@somesylvie somesylvie merged commit eca4729 into main Oct 23, 2024
18 checks passed
@somesylvie somesylvie deleted the alert-when-instance-count-is-low branch October 23, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants