Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure outage alert #1458

Closed
wants to merge 5 commits into from
Closed

Azure outage alert #1458

wants to merge 5 commits into from

Conversation

jherrflexion
Copy link
Contributor

Add a PR title

Describe what changed in this PR at a high level.

Issue

Add a link to the issue here. Consider using
closing keywords
if the this PR isn't for a story (stories will be closed through different means).

Checklist

  • I have added tests to cover my changes
  • I have added logging where useful (with appropriate log level)
  • I have added JavaDocs where required
  • I have updated the documentation accordingly

Note: You may remove items that are not applicable

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Configuration Logic
The conditional deployment of the Azure service health alert based on the environment might need further validation to ensure it aligns with operational requirements.

Hardcoded Values
The alert configuration uses hardcoded values for alert levels and service events which might need to be configurable to adapt to different operational scenarios.


criteria {
category = "ServiceHealth"
levels = ["Error"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a variable for the alert level to allow flexibility in configuration without code changes. [important]

levels = ["Error"]
service_health {
locations = var.service_health_locations
events = ["Incident"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended to externalize the service events into a variable to enhance the flexibility and maintainability of the alert configurations. [important]


lifecycle {
ignore_changes = [
tags["business_steward"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate if the lifecycle ignore_changes configuration is necessary for all these tags, as it might lead to overlooking important changes. [medium]

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Best practice
Adjust the lifecycle policy to avoid ignoring changes to critical compliance tags

Review the ignore_changes lifecycle policy to ensure it aligns with the operational
requirements. Ignoring changes to critical tags like security_compliance and
pii_data might lead to oversight in compliance tracking.

operations/template/alert.tf [56-68]

 ignore_changes = [
   tags["business_steward"],
   ...
-  tags["pii_data"],
-  tags["security_compliance"],
-  ...
 ]
Suggestion importance[1-10]: 8

Why: This suggestion addresses a significant issue regarding compliance and security by recommending not to ignore changes to critical tags. This could prevent potential oversights in compliance tracking, making it a high-impact suggestion.

8
Possible issue
Correct the scope of the Azure service health alert to target the appropriate resources

Ensure that the scopes field in the azurerm_monitor_activity_log_alert resource is
correctly set to target the intended Azure resources. Currently, it is set to use
the ID of an Azure Container Registry, which might not be relevant for a service
health alert.

operations/template/alert.tf [36]

-scopes = [azurerm_container_registry.registry.id]
+scopes = [data.azurerm_resource_group.group.id]
Suggestion importance[1-10]: 7

Why: The suggestion correctly identifies a potential misconfiguration in the scope of the Azure service health alert, which could lead to monitoring the wrong resources. Changing the scope to the resource group could be more appropriate depending on the context.

7
Ensure the alert resource is correctly provisioned based on the environment type

Validate the conditional count logic for creating the
azurerm_monitor_activity_log_alert resource to ensure it aligns with the deployment
strategy, especially in production environments.

operations/template/alert.tf [32]

-count = local.non_pr_environment ? 1 : 0
+count = local.is_production ? 1 : 0
Suggestion importance[1-10]: 6

Why: The suggestion to review the conditional logic for resource creation is relevant, especially to ensure that alerts are appropriately provisioned in different environments. However, without specific details about the deployment strategy, the exact impact is uncertain.

6
Enhancement
Specify targeted Azure services in the alert criteria for better accuracy

Consider adding more specific service names in the services field instead of using a
wildcard. This will help in targeting alerts more accurately to the affected
services.

operations/template/alert.tf [44]

-services = ["*"]
+services = ["Compute", "Storage", "Networking"]
Suggestion importance[1-10]: 5

Why: While specifying services explicitly can enhance the accuracy of alerts, using a wildcard might be intentional to cover all services. The suggestion is valid but its impact depends on the specific monitoring needs.

5

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant