You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Dedicated GPU & CPU Nodes
Separate nodes into those that:
* Run GPU workloads
* Run CPU workloads
* Do not run Run:ai at all. these jobs will not be monitored using the Run:ai Administration User interface.
This is actually not true, all nodes in the cluster are displayed under Nodes tab in the Administration UI.
That includes Run:ai worker nodes, Run:ai system nodes, regular workers, and cluster masters.
All nodes containing GPUs and having DCGM exporting metrics upon them, would count as "GPU nodes" in the Overview dashboard.
That includes nodes that don't have the runai-container-toolkit & runai-container-toolkit-exporter DaemonSets running on them - that means that any Run:ai pod won't be scheduled upon them, but they are still counted.
Review nodes names using `kubectl get nodes`. For each such node run:
'```
runai-adm set node-role --gpu-worker <node-name>
'```
or
'```
runai-adm set node-role --cpu-worker <node-name>
'```
Nodes not marked as GPU worker or CPU worker will not run Run:ai at all.
That's also not true, nodes that are not marked as GPU workers nor CPU workers would run any kind of Run:ai workload.
The same behavior will be achieved if both roles are assigned to a node.
The text was updated successfully, but these errors were encountered:
The following is described under node roles doc:
This is actually not true, all nodes in the cluster are displayed under
Nodes
tab in the Administration UI.That includes Run:ai worker nodes, Run:ai system nodes, regular workers, and cluster masters.
All nodes containing GPUs and having DCGM exporting metrics upon them, would count as "GPU nodes" in the
Overview
dashboard.That includes nodes that don't have the
runai-container-toolkit
&runai-container-toolkit-exporter
DaemonSets running on them - that means that any Run:ai pod won't be scheduled upon them, but they are still counted.That's also not true, nodes that are not marked as GPU workers nor CPU workers would run any kind of Run:ai workload.
The same behavior will be achieved if both roles are assigned to a node.
The text was updated successfully, but these errors were encountered: