Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for provisioning/deprovisioning step result #1039

Open
vvxxvvxx opened this issue Aug 14, 2024 · 0 comments
Open

Add metrics for provisioning/deprovisioning step result #1039

vvxxvvxx opened this issue Aug 14, 2024 · 0 comments

Comments

@vvxxvvxx
Copy link
Contributor

vvxxvvxx commented Aug 14, 2024

Description

In order to get notified or mitigate the issue before the provisioning/deprovisioning reports final timeout failure, we'd like to have the metrics for the step result of the provisioning/deprovisioning operation.

With the step result metrics, we can set up the alert when the provisioning/deprovisioning gets constant failure in a certain step. We can be notified with the alert for step failure and then mitigate the issue before the provisioning/deprovisioning hits its final timeout failure.

We have the operation step result metric in v1, but they were removed in v2. We could add them back and assign proper labels for the step result metrics.

Reasons
We only have the metrics to report the provisioning/deprovisioning final status (succeeded or failed), so we can't get notified when one step of provisioning/deprovisioning is failing and keep retrying until the whole process gets timeout. If we have the alert for step failure, we can mitigate the issue earlier and possibly mitigate the issue before the operation hits timeout failure.

Attachments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant