Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack COS: Ignore openstack online runners #288

Merged
merged 3 commits into from
Jun 3, 2024

Conversation

cbartz
Copy link
Collaborator

@cbartz cbartz commented May 30, 2024

Applicable spec: n/a

Overview

Ignore openstack online runners when extracting metrics.

Rationale

as it may be that the metrics have not yet been pulled.

We have seen a case in production where a runner had no status at all in Github, but the Openstack instance was still up. This led to a metrics extraction which removed the metrics storage, and on the next reconciliation the code tried to pull the metrics from the server, but threw a MetricsStorageError as the metrics could not be found:

2024-05-29 06:59:03 INFO unit.openstack-arm-large/21.juju-log server.go:325 Found 0 online and 0 offline openstack runners, 0 of the runners are busy	
2024-05-29 07:04:21 WARNING unit.openstack-arm-large/21.juju-log server.go:325 Unable to SSH into openstack-arm-large-21-fb2158b3a5b82820be9c1e79 with address 10.145.225.121	
2024-05-29 07:08:35 DEBUG unit.openstack-arm-large/21.juju-log server.go:325 Found openstack instances: [openstack.compute.v2.server.Server(id=ff598200-6eaf-4c49-a2ae-f74a60ecd738, name=openstack-arm-large-21-fb2158b3a5b82820be9c1e79, status=ACTIVE, ...
2024-05-29 07:09:20 DEBUG unit.openstack-arm-large/21.juju-log server.go:325 Extracting metrics from metrics storage for runner openstack-arm-large-21-fb2158b3a5b82820be9c1e79
2024-05-29 07:09:20 WARNING unit.openstack-arm-large/21.juju-log server.go:325 pre-job-metrics.json not found for runner openstack-arm-large-21-fb2158b3a5b82820be9c1e79.	
2024-05-29 07:09:20 DEBUG unit.openstack-arm-large/21.juju-log server.go:325 Cleaning metrics storage for runner openstack-arm-large-21-fb2158b3a5b82820be9c1e79
2024-05-29 07:21:52 INFO unit.openstack-arm-large/21.juju-log server.go:325 Pulling metrics and deleting server for OpenStack runner openstack-arm-large-21-fb2158b3a5b82820be9c1e79
2024-05-29 07:21:52 ERROR unit.openstack-arm-large/21.juju-log server.go:325 Failed to get shared metrics storage for runner openstack-arm-large-21-fb2158b3a5b82820be9c1e79, will not be able to issue all metrics.
2024-05-29 09:21:53.069	
  File "/var/lib/juju/agents/unit-openstack-arm-large-21/charm/src/openstack_cloud/openstack_manager.py", line 1352, in _pull_metrics
2024-05-29 09:21:53.069	
    storage = metrics_storage.get(instance_name)
2024-05-29 09:21:53.069	
  File "/var/lib/juju/agents/unit-openstack-arm-large-21/charm/src/metrics/storage.py", line 137, in get
2024-05-29 09:21:53.069	
    raise GetMetricsStorageError(f"Metrics storage for runner {runner_name} not found.")
2024-05-29 09:21:53.070	
errors.GetMetricsStorageError: Metrics storage for runner openstack-arm-large-21-fb2158b3a5b82820be9c1e79 not found.

Juju Events Changes

n/a

Module Changes

openstack_cloud.openstack_manager.OpenstackRunnerManager._issue_runner_metrics: Now lists the servers and ignores the servers=runners in ACTIVE mode for metrics extraction.

Library Changes

Checklist

@cbartz cbartz added bug Something isn't working trivial labels May 30, 2024
@cbartz cbartz changed the base branch from main to feat/openstack-integration May 30, 2024 14:01
@cbartz cbartz marked this pull request as ready for review May 31, 2024 07:49
@cbartz cbartz requested a review from a team as a code owner May 31, 2024 07:49
Copy link
Collaborator

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cbartz cbartz merged commit ef9fd69 into feat/openstack-integration Jun 3, 2024
3 of 14 checks passed
@cbartz cbartz deleted the fix/cos-integration-openstack branch June 3, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working trivial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants