You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to get notified or mitigate the issue before the provisioning/deprovisioning reports final timeout failure, we'd like to have the metrics for the step result of the provisioning/deprovisioning operation.
With the step result metrics, we can set up the alert when the provisioning/deprovisioning gets constant failure in a certain step. We can be notified with the alert for step failure and then mitigate the issue before the provisioning/deprovisioning hits its final timeout failure.
We have the operation step result metric in v1, but they were removed in v2. We could add them back and assign proper labels for the step result metrics.
Reasons
We only have the metrics to report the provisioning/deprovisioning final status (succeeded or failed), so we can't get notified when one step of provisioning/deprovisioning is failing and keep retrying until the whole process gets timeout. If we have the alert for step failure, we can mitigate the issue earlier and possibly mitigate the issue before the operation hits timeout failure.
Attachments
The text was updated successfully, but these errors were encountered:
Description
In order to get notified or mitigate the issue before the provisioning/deprovisioning reports final timeout failure, we'd like to have the metrics for the step result of the provisioning/deprovisioning operation.
With the step result metrics, we can set up the alert when the provisioning/deprovisioning gets constant failure in a certain step. We can be notified with the alert for step failure and then mitigate the issue before the provisioning/deprovisioning hits its final timeout failure.
We have the operation step result metric in v1, but they were removed in v2. We could add them back and assign proper labels for the step result metrics.
Reasons
We only have the metrics to report the provisioning/deprovisioning final status (succeeded or failed), so we can't get notified when one step of provisioning/deprovisioning is failing and keep retrying until the whole process gets timeout. If we have the alert for step failure, we can mitigate the issue earlier and possibly mitigate the issue before the operation hits timeout failure.
Attachments
The text was updated successfully, but these errors were encountered: