Creating jobspec via the Python API for cores and GPUs that share a socket #3152
-
The following script (specifically the #!/usr/bin/flux python
import flux
from flux.job import JobspecV1
def insert_socket(resource_list):
for idx, resource in enumerate(resource_list):
try:
if resource["type"] == "slot":
resource_list[idx] = {
"type": "socket",
"count": 1,
"with": [resource],
}
return
else:
insert_socket(resource["with"])
except (IndexError, KeyError):
return
jobspec = JobspecV1.from_command(["hostname"],
num_tasks=1,
cores_per_task=2,
gpus_per_task=2,
num_nodes=1)
insert_socket(jobspec.jobspec["resources"])
print(jobspec.dumps())
#3151 prevents this jobspec from being scheduled and executed currently. Once that is closed, this should work for your MuMMi use case @FrankD412. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I confirmed that with the merging of #3151, this is now possible in v0.19.0 with some small modifications to the original example. It turns out that hwloc v1 doesn't put the GPUs and Cores under the same socket but instead under the same Numanode, so after swapping socket out for numanode in the generated jobspec, this appears to work as intended on Lassen. The magic run line is (note the FLUXION env var at the start which explicitly tells it to allow numanodes in the resource graph):
Where #!/bin/bash
jobspec=$(flux python samenuma.py)
JOBID1=$(echo $jobspec | flux job submit)
echo "Submitted $JOBID1"
JOBID2=$(echo $jobspec | flux job submit)
echo "Submitted $JOBID2"
for id in $JOBID1 $JOBID2; do
echo "Output for $id:"
flux job attach $id
echo "Concretized R for $id:"
flux job info $id R | jq .
done
And the output of the whole thing is:
Without the |
Beta Was this translation helpful? Give feedback.
I confirmed that with the merging of #3151, this is now possible in v0.19.0 with some small modifications to the original example. It turns out that hwloc v1 doesn't put the GPUs and Cores under the same socket but instead under the same Numanode, so after swapping socket out for numanode in the generated jobspec, this appears to work as intended on Lassen.
The magic run line is (note the FLUXION env var at the start which explicitly tells it to allow numanodes in the resource graph):
Wh…