Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlabeled Node Roles #234

Open
doronkg opened this issue Apr 3, 2023 · 0 comments
Open

Unlabeled Node Roles #234

doronkg opened this issue Apr 3, 2023 · 0 comments

Comments

@doronkg
Copy link
Contributor

doronkg commented Apr 3, 2023

The following is described under node roles doc:

## Dedicated GPU & CPU Nodes

Separate nodes into those that:

* Run GPU workloads
* Run CPU workloads
* Do not run Run:ai at all. these jobs will not be monitored using the Run:ai Administration User interface. 

This is actually not true, all nodes in the cluster are displayed under Nodes tab in the Administration UI.
That includes Run:ai worker nodes, Run:ai system nodes, regular workers, and cluster masters.

All nodes containing GPUs and having DCGM exporting metrics upon them, would count as "GPU nodes" in the Overview dashboard.
That includes nodes that don't have the runai-container-toolkit & runai-container-toolkit-exporter DaemonSets running on them - that means that any Run:ai pod won't be scheduled upon them, but they are still counted.

Review nodes names using `kubectl get nodes`. For each such node run:

'```
runai-adm set node-role --gpu-worker <node-name>
'```

or 

'```
runai-adm set node-role --cpu-worker <node-name>
'```

Nodes not marked as GPU worker or CPU worker will not run Run:ai at all.

That's also not true, nodes that are not marked as GPU workers nor CPU workers would run any kind of Run:ai workload.
The same behavior will be achieved if both roles are assigned to a node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant