You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had to disable Arch-BOMs "sustained" alerts after the move from New Relic to Datadog because DD does not have the ability to monitor on an alert condition staying above/below a threshold for an extended interval. Some were recreated but were eventually paused or had their sensitivity turned down to the point of uselessness.
"Sustained" monitors have a counterpart "burst" monitors. Burst monitors are for a large change over a short time interval, while sustained monitors look for a smaller change over a longer time interval. The idea is to detect slower drift that's otherwise swamped by noise.
Now that we have some options laid out in https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1581023295/Options+for+Datadog+time-period+APM+monitors we can try recreating them. We'll need to try some out and turn that document into a how-to, complete with cautions and advice. It may also need to be linked to from monitors to explain why they're configured in an unusual way. (For example, if we use renotification, monitors may be in an "alert" state even though they aren't firing, and we'll want the doc page to explain this.)
Acceptance Criteria:
Evaluate the options we have identified and run experiments to determine which we want to go with.
Document pros/cons, including ease of configuration, how amenable this is to Terraform vs. manual config, how much alert noise we expect, etc.
Create DD sustained monitors to match what we had in NR (or revive the ones we paused or re-tuned)
One of the burst monitors (high errors) has since been changed into a composite monitor fed by two non-alerting component monitors, and the sustained version should be converted into a similar structure.
Rework documentation page into a how-to and explainer based on what we learned
We had to disable Arch-BOMs "sustained" alerts after the move from New Relic to Datadog because DD does not have the ability to monitor on an alert condition staying above/below a threshold for an extended interval. Some were recreated but were eventually paused or had their sensitivity turned down to the point of uselessness.
"Sustained" monitors have a counterpart "burst" monitors. Burst monitors are for a large change over a short time interval, while sustained monitors look for a smaller change over a longer time interval. The idea is to detect slower drift that's otherwise swamped by noise.
Now that we have some options laid out in https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1581023295/Options+for+Datadog+time-period+APM+monitors we can try recreating them. We'll need to try some out and turn that document into a how-to, complete with cautions and advice. It may also need to be linked to from monitors to explain why they're configured in an unusual way. (For example, if we use renotification, monitors may be in an "alert" state even though they aren't firing, and we'll want the doc page to explain this.)
Acceptance Criteria:
The text was updated successfully, but these errors were encountered: