Launching a job without consuming scheduler resources #3737
Replies: 5 comments 12 replies
-
@grondo had the brilliant idea to support adding arbitrary R values into jobspec. A jobtap plugin could see this R and skip the scheduler altogether and move the job directly into the exec system/run state. Obviously this would need to be restricted to the instance owner. The submission of jobspec is preserved in this case so that a unique Fluid and KVS directory are created automatically, and so that the typical job lifecycle is preserved. This capability could also be leveraged by tools/debuggers to co-schedule/co-launch daemons alongside an existing job (just copy the job's R, stuff it in a jobspec, and change out the command to the debugger). Some complications that we discussed on the call:
|
Beta Was this translation helpful? Give feedback.
-
Oooh, this seems like a nice idea! It seems like this could be done without altering the state diagram - as you suggest above (I think?) Job enters SCHED state, calls stack of jobtap plugins, one of which may satisfy the resource request based on R fragment provided in jobspec. Upon completion, if no resources, enter priority queue as usual, send alloc request... Otherwise, proceed to RUN state. Heh, it's like a parasitiic allocation. |
Beta Was this translation helpful? Give feedback.
-
Proof of concept posted in PR #3740 |
Beta Was this translation helpful? Give feedback.
-
Hi Fluxers, would you happen to have some Python example code to show how a job with Many thanks, Andre. |
Beta Was this translation helpful? Give feedback.
-
We have a few workflows (e.g., Merlin and Swift) that would like to use Flux to launch their daemons/workers across the allocation, and then from those daemons/workers, submit parallel MPI jobs to run (intentionally oversubscribing the cores/nodes to run both the jobs and the daemons). If
flux mini
is used for both sets of launches, problems arise with overallocation in the scheduler (the same resources are being requested twice)In the case of Merlin, the daemons communicate via RabbitMQ, so launching them with
flux exec
is workable, leaving just the MPI jobs to be run withflux mini
and the scheduler.In the case of Swift, both the workers and the parallel jobs use MPI. So Swift need PMI at both levels, making
flux exec
not really an option (baring some kludges to manually launch the job-shell).What would be great in the Swift case is if the outer launch could be done using
flux mini
, but if the workflow could request that the job take up no resources in the scheduler. Alternatively, maybeflux exec
could be extended to handle the MPI case via some user-side extension (e.g.,flux exec flux job-shell ...
).Beta Was this translation helpful? Give feedback.
All reactions