Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUN-17254 Prune non-existing pods from topology #75

Merged
merged 1 commit into from
Apr 4, 2024

Conversation

gshaibi
Copy link
Contributor

@gshaibi gshaibi commented Apr 3, 2024

No description provided.

@gshaibi gshaibi changed the title Prune non-existing pods from topology RUN-17254 Prune non-existing pods from topology Apr 3, 2024
Comment on lines +120 to +132
for i := range nodeTopology.Gpus {
nodeTopology.Gpus[i].Status.PodGpuUsageStatus = topology.PodGpuUsageStatusMap{}

// Remove non-existing pods from the allocation info
allocatingPodExists, err := isPodExist(c.kubeClient, nodeTopology.Gpus[i].Status.AllocatedBy.Pod, nodeTopology.Gpus[i].Status.AllocatedBy.Namespace)
if err != nil {
return fmt.Errorf("failed to check if pod %s exists: %v", nodeTopology.Gpus[i].Status.AllocatedBy.Pod, err)
}

if !allocatingPodExists {
nodeTopology.Gpus[i].Status.AllocatedBy = topology.ContainerDetails{}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought - in scale scenarios you could have 10000+ "gpus"
maybe we want to support it in a pool of goroutines, say 1 for each 100/1000 etc.
i'm not sure how much this section takes but it looked like it might be costly on scale.

if not - ignore.
if yes and you want to do in another jira - that's also ok open the jira and send me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section happens only upon initialization and is not part of the ongoing logic of the status-updater so I think we can start with that and improve if needed later on

@gshaibi gshaibi merged commit 4911439 into main Apr 4, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants