Azure outage alert #1458

jherrflexion · 2024-10-18T19:09:01Z

Add a PR title

Describe what changed in this PR at a high level.

Issue

Add a link to the issue here. Consider using
closing keywords
if the this PR isn't for a story (stories will be closed through different means).

Checklist

I have added tests to cover my changes
I have added logging where useful (with appropriate log level)
I have added JavaDocs where required
I have updated the documentation accordingly

Note: You may remove items that are not applicable

Co-Authored-By: Samuel Aquino <[email protected]>

github-actions · 2024-10-18T19:11:54Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Configuration Logic The conditional deployment of the Azure service health alert based on the environment might need further validation to ensure it aligns with operational requirements. Hardcoded Values The alert configuration uses hardcoded values for alert levels and service events which might need to be configurable to adapt to different operational scenarios.

github-actions · 2024-10-18T19:11:54Z

operations/template/alert.tf

+
+  criteria {
+    category = "ServiceHealth"
+    levels   = ["Error"]


Consider using a variable for the alert level to allow flexibility in configuration without code changes. [important]

github-actions · 2024-10-18T19:11:54Z

operations/template/alert.tf

+    levels   = ["Error"]
+    service_health {
+      locations = var.service_health_locations
+      events    = ["Incident"]


It's recommended to externalize the service events into a variable to enhance the flexibility and maintainability of the alert configurations. [important]

github-actions · 2024-10-18T19:11:54Z

operations/template/alert.tf

+
+  lifecycle {
+    ignore_changes = [
+      tags["business_steward"],


Validate if the lifecycle ignore_changes configuration is necessary for all these tags, as it might lead to overlooking important changes. [medium]

github-actions · 2024-10-18T19:12:22Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Best practice	Adjust the lifecycle policy to avoid ignoring changes to critical compliance tags Review the `ignore_changes` lifecycle policy to ensure it aligns with the operational requirements. Ignoring changes to critical tags like `security_compliance` and `pii_data` might lead to oversight in compliance tracking. operations/template/alert.tf [56-68] ignore_changes = [ tags["business_steward"], ... - tags["pii_data"], - tags["security_compliance"], - ... ] Suggestion importance[1-10]: 8 Why: This suggestion addresses a significant issue regarding compliance and security by recommending not to ignore changes to critical tags. This could prevent potential oversights in compliance tracking, making it a high-impact suggestion.	8
Possible issue	Correct the scope of the Azure service health alert to target the appropriate resources Ensure that the `scopes` field in the `azurerm_monitor_activity_log_alert` resource is correctly set to target the intended Azure resources. Currently, it is set to use the ID of an Azure Container Registry, which might not be relevant for a service health alert. operations/template/alert.tf [36] -scopes = [azurerm_container_registry.registry.id] +scopes = [data.azurerm_resource_group.group.id] Suggestion importance[1-10]: 7 Why: The suggestion correctly identifies a potential misconfiguration in the scope of the Azure service health alert, which could lead to monitoring the wrong resources. Changing the scope to the resource group could be more appropriate depending on the context.	7
Possible issue	Ensure the alert resource is correctly provisioned based on the environment type Validate the conditional count logic for creating the `azurerm_monitor_activity_log_alert` resource to ensure it aligns with the deployment strategy, especially in production environments. operations/template/alert.tf [32] -count = local.non_pr_environment ? 1 : 0 +count = local.is_production ? 1 : 0 Suggestion importance[1-10]: 6 Why: The suggestion to review the conditional logic for resource creation is relevant, especially to ensure that alerts are appropriately provisioned in different environments. However, without specific details about the deployment strategy, the exact impact is uncertain.	6
Enhancement	Specify targeted Azure services in the alert criteria for better accuracy Consider adding more specific service names in the `services` field instead of using a wildcard. This will help in targeting alerts more accurately to the affected services. operations/template/alert.tf [44] -services = ["*"] +services = ["Compute", "Storage", "Networking"] Suggestion importance[1-10]: 5 Why: While specifying services explicitly can enhance the accuracy of alerts, using a wildcard might be intentional to cover all services. The suggestion is valid but its impact depends on the specific monitoring needs.	5

sonarqubecloud · 2024-10-18T19:13:09Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

jherrflexion and others added 5 commits October 18, 2024 09:33

WIP Azure Outage Alert

a629617

Co-Authored-By: Samuel Aquino <[email protected]>

Attempt action_group_id fix

07fbaf3

Co-Authored-By: Samuel Aquino <[email protected]>

Removed unnecessary email_subject

e4ec0a4

Refactoring location

71c1056

Remove temp change

1741c97

jherrflexion closed this Oct 18, 2024

jherrflexion temporarily deployed to pr October 18, 2024 19:09 — with GitHub Actions Inactive

jherrflexion had a problem deploying to pr October 18, 2024 19:10 — with GitHub Actions Failure

jherrflexion had a problem deploying to pr October 18, 2024 19:11 — with GitHub Actions Failure

github-actions bot reviewed Oct 18, 2024

View reviewed changes

jherrflexion had a problem deploying to pr October 18, 2024 19:54 — with GitHub Actions Failure

jherrflexion had a problem deploying to pr October 18, 2024 20:46 — with GitHub Actions Failure

jherrflexion had a problem deploying to pr October 18, 2024 21:49 — with GitHub Actions Failure

jherrflexion had a problem deploying to pr October 21, 2024 14:28 — with GitHub Actions Failure

jherrflexion had a problem deploying to pr October 21, 2024 14:45 — with GitHub Actions Failure

jherrflexion temporarily deployed to pr October 21, 2024 14:46 — with GitHub Actions Inactive

jherrflexion had a problem deploying to pr October 21, 2024 16:04 — with GitHub Actions Failure

jherrflexion deleted the azure-outage-alert branch November 4, 2024 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure outage alert #1458

Azure outage alert #1458

jherrflexion commented Oct 18, 2024

github-actions bot commented Oct 18, 2024

github-actions bot Oct 18, 2024

github-actions bot Oct 18, 2024

github-actions bot Oct 18, 2024

github-actions bot commented Oct 18, 2024

sonarqubecloud bot commented Oct 18, 2024

Azure outage alert #1458

Azure outage alert #1458

Conversation

jherrflexion commented Oct 18, 2024

Add a PR title

Issue

Checklist

github-actions bot commented Oct 18, 2024

PR Reviewer Guide 🔍

github-actions bot Oct 18, 2024

Choose a reason for hiding this comment

github-actions bot Oct 18, 2024

Choose a reason for hiding this comment

github-actions bot Oct 18, 2024

Choose a reason for hiding this comment

github-actions bot commented Oct 18, 2024

PR Code Suggestions ✨

sonarqubecloud bot commented Oct 18, 2024

Quality Gate passed