Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dummy split channels #1495

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
VPN_CA_CERTIFICATE: ${{ secrets.VPN_CA_CERTIFICATE }}
VPN_GITHUB_CERTIFICATE: ${{ secrets.VPN_GITHUB_CERTIFICATE}}
VPN_GITHUB_SECRET_KEY: ${{ secrets.VPN_GITHUB_SECRET_KEY }}
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.ALERT_SLACK_EMAIL }}"
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.NON_PROD_ALERT_SLACK_EMAIL }}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended to encapsulate the Terraform apply parameters into a script or a separate step to enhance readability and maintainability of the workflow file. [medium]


staging-deploy:
name: Staging Application Deploy
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/dev-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
VPN_CA_CERTIFICATE: ${{ secrets.VPN_CA_CERTIFICATE }}
VPN_GITHUB_CERTIFICATE: ${{ secrets.VPN_GITHUB_CERTIFICATE}}
VPN_GITHUB_SECRET_KEY: ${{ secrets.VPN_GITHUB_SECRET_KEY }}
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.ALERT_SLACK_EMAIL }}"
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.NON_PROD_ALERT_SLACK_EMAIL }}"

dev-deploy:
name: Dev Application Deploy
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/internal-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.ALERT_SLACK_EMAIL }}"
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.NON_PROD_ALERT_SLACK_EMAIL }}"

internal-deploy:
name: Internal Application Deploy
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/prod-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
VPN_CA_CERTIFICATE: ${{ secrets.VPN_CA_CERTIFICATE }}
VPN_GITHUB_CERTIFICATE: ${{ secrets.VPN_GITHUB_CERTIFICATE}}
VPN_GITHUB_SECRET_KEY: ${{ secrets.VPN_GITHUB_SECRET_KEY }}
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.ALERT_SLACK_EMAIL }}"
TERRAFORM_APPLY_PARAMETERS: -var="alert_slack_email=${{ secrets.PROD_ALERT_SLACK_EMAIL }}"

prod-deploy:
name: Prod Application Deploy
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/terraform-ci-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
TERRAFORM_APPLY_PARAMETERS: -var="pr_number=${{ github.event.number }}" -var="alert_slack_email=${{ secrets.ALERT_SLACK_EMAIL }}"
TERRAFORM_APPLY_PARAMETERS: -var="pr_number=${{ github.event.number }}" -var="alert_slack_email=${{ secrets.NON_PROD_ALERT_SLACK_EMAIL }}"


terraform-deploy-skip: # runs when the PR doesn't have any changes that require the PR deploy; this ensures we get the appropriate required PR checks
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/terraform-ci-destroy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ jobs:
run: terraform init -backend-config="key=pr_${{ github.event.number }}.tfstate"

- name: Terraform Destroy
run: terraform destroy -auto-approve -input=false -var="pr_number=${{ github.event.number }}" -var="alert_slack_email=${{ secrets.ALERT_SLACK_EMAIL }}"
run: terraform destroy -auto-approve -input=false -var="pr_number=${{ github.event.number }}" -var="alert_slack_email=${{ secrets.NON_PROD_ALERT_SLACK_EMAIL }}"
31 changes: 18 additions & 13 deletions adr/020-azure-alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,19 @@ Accepted.
## Context

As part of our CI/CD infrastructure, we need notifications when failures occur.
We chose Azure for alerting because it's built into the infrastructure we're already using,
which gives us easy access to metrics. We're not currently using an external
log aggregation system, so Azure alerts were a much lower lift to implement than
any of the other potential options.

To ensure rapid response to application failures within our CI/CD infrastructure, we require real-time notifications for critical issues. The current alert setup focuses on:
Alerts are configured in [alert.tf](../operations/template/alert.tf). To reduce

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve the documentation, consider adding a section on how to troubleshoot common issues with Azure alerts, which could be beneficial for new team members or during incident response. [medium]

unhelpful notifications, we have alerts turned off in the PR environments, so they must
either be tested in `internal` or `dev`, or developers may temporarily turn alerts back on in
their branch's PR environment.

- **Type:** [Azure Log Search Alerts](https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-types#log-alerts) for HikariCP connection failures.


- **Trigger:** Any logged failures with database connections.


- **Configuration:** Alerts are stateful (auto-mitigation); set to `fired` status to reduce noise from frequent or duplicate alerts.


- **Notification:** Alerts sent to a Slack channel via email until PagerDuty is operational.
Alerts are sent to email addresses that forward to Slack channels. As of October 2024,
production alerts go to `#production-alerts-cdc-trusted-intermediary` and non-prod alerts
go to `#non-prod-alerts-cdc-trusted-intermediary`.

## Impact

Expand All @@ -35,11 +35,16 @@ To ensure rapid response to application failures within our CI/CD infrastructure

### Negative

- Possible alert fatigue if not fine-tuned
- Azure's built-in alert options are less robust than some other services - for instance,
they don't have an option for p50/90/99 latency alert. This means we're more limited in
what kinds of alerts we can have
- Navigating from the Azure Slack alerts to the actual logs where issues are occurring
is unintuitive and requires multiple clicks. Even once you find the right logs,
Azure logs lack syntax highlighting and can be hard to read.

### Risks

- None
- Possible alert fatigue if not fine-tuned

## Related Issues

Expand Down
4 changes: 3 additions & 1 deletion operations/template/alert.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ resource "azurerm_monitor_action_group" "notify_slack_email" {
short_name = "cdcti-alerts"

email_receiver {
name = "cdcti-flexion-slack-email-receiver"
name = "cdcti-flexion-slack-email-receiver"
// This variable is set in the `env-deploy.yml` GH action for each environment
// We use a different email address/Slack channel for prod and non-prod alerts
email_address = var.alert_slack_email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a conditional check to ensure that the email_address variable is not empty before applying the configuration. This can prevent misconfiguration that might lead to missing alerts. [important]

}

Expand Down
Loading