Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Azure outage alert #1459

Closed
wants to merge 5 commits into from
Closed

Test Azure outage alert #1459

wants to merge 5 commits into from

Conversation

jherrflexion
Copy link
Contributor

Add a PR title

Describe what changed in this PR at a high level.

Issue

Add a link to the issue here. Consider using
closing keywords
if the this PR isn't for a story (stories will be closed through different means).

Checklist

  • I have added tests to cover my changes
  • I have added logging where useful (with appropriate log level)
  • I have added JavaDocs where required
  • I have updated the documentation accordingly

Note: You may remove items that are not applicable

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Configuration Consistency
The new alert configuration might not be consistent with existing patterns or best practices in terms of resource naming and lifecycle management.

@@ -28,6 +28,48 @@ resource "azurerm_monitor_action_group" "notify_slack_email" {
}
}

resource "azurerm_monitor_activity_log_alert" "azure_service_health_alert" {
count = local.non_pr_environment ? 1 : 0
name = "cdcti-${var.environment}-azure-status-alert"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a variable for the alert name prefix instead of hardcoding 'cdcti-' to maintain consistency and configurability across different environments. [important]

@@ -28,6 +28,48 @@
}
}

resource "azurerm_monitor_activity_log_alert" "azure_service_health_alert" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended to add a tag to the new resource for better resource management and to align with the existing infrastructure as code practices. [important]

tags["security_steward"],
tags["support_group"],
tags["system"],
tags["technical_steward"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the 'ignore_changes' lifecycle policy is reviewed to confirm that it aligns with the operational requirements and does not inadvertently ignore important changes that should trigger updates. [medium]

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Enhancement
Replace the wildcard in the services field with specific service categories to enhance alert specificity

Consider specifying more granular service categories in the service_health block
instead of using a wildcard. This can improve the alert's accuracy and relevance.

operations/template/alert.tf [44]

-services  = ["*"]
+services  = ["Compute", "Storage"]  # Specify relevant services explicitly
Suggestion importance[1-10]: 5

Why: Suggesting more specific service categories could indeed improve the alert's specificity and relevance. However, the use of a wildcard might be intentional to cover all services, so this suggestion is context-dependent.

5
Possible issue
Adjust the count condition to correctly enable or disable the alert based on the environment

Ensure that the count condition for the resource azurerm_monitor_activity_log_alert
is correctly set to enable or disable the alert based on the environment. If the
intention is to disable the alert in production environments, consider revising the
condition or adding a comment for clarity.

operations/template/alert.tf [32]

-count = local.non_pr_environment ? 1 : 0
+count = local.non_pr_environment ? 0 : 1  # Assuming the alert should be disabled in production
Suggestion importance[1-10]: 4

Why: The suggestion to adjust the 'count' condition is valid but assumes the opposite intention of the existing code without clear evidence. The existing code disables the alert in production environments, which might be intentional.

4
Ensure the scopes field accurately targets all relevant Azure resources

Verify the scopes field to ensure that it correctly targets the intended Azure
resources for monitoring. If the scope is intended to include more than the
container registry, additional resources should be added to the array.

operations/template/alert.tf [36]

-scopes = [azurerm_container_registry.registry.id]
+scopes = [azurerm_container_registry.registry.id, additional_resource.id]  # Add other required resources
Suggestion importance[1-10]: 3

Why: The suggestion to verify the scopes field is relevant, but the improved code assumes additional resources need to be monitored without specific evidence from the PR context.

3
Best practice
Add validation for service_health_locations to ensure they are valid Azure regions

Add validation to the service_health_locations variable to ensure that the provided
values are valid Azure regions. This can prevent configuration errors.

operations/template/alert.tf [42]

-locations = var.service_health_locations
+locations = var.service_health_locations  # Ensure these are valid Azure regions
Suggestion importance[1-10]: 2

Why: The suggestion to add validation is a good practice, but the improved code does not actually implement any validation mechanism; it only adds a comment which does not enforce any checks.

2

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant