-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config for combined working and mock experiment #5295
Comments
You are explicitly configuring all your nodes with 104 cores here: [[resource.config]]
hosts = "flux-sample[0-3],burst[0-99]"
cores = "0-103" Also:
You can name them whatever you want. Did you try it and it didn't work? |
The scheduler will not schedule jobs onto down nodes no matter what. So am I right in understanding what you want is to run a test configured with 4 "real" nodes Here's some ideas/pointers:
|
BTW, I think a slightly less kludgy way will be possible once #5184 is merged since then you can at least override the instance |
I must have had a bug the first time - it worked this time! I just wanted to rename to
Yes correct! Okay trying to go off of what you said - here are some experiments. First, removing the [sched-fluxion-qmanager]
queue-policy = "easy"
[sched-fluxion-resource]
match-format = "$MATCH_FORMAT"
[resource]
noverify = true
norestrict = true
[queues.offline]
requires = ["offline"]
[queues.online]
requires = ["online"]
[[resource.config]]
hosts = "flux-sample[0-3]"
properties = ["online"]
[[resource.config]]
hosts = "flux-sample[0-3]"
cores = "0-3"
[[resource.config]]
hosts = "burst[0-99]"
properties = ["offline"] that is angry that the resources are not defined: Jun 27 22:55:37.943311 resource.err[0]: error parsing [resource.config] array: resource.config: burst[0-99] assigned no resources
Jun 27 22:55:37.943324 resource.crit[0]: module exiting abnormally
flux-module: load resource: Connection reset by peer Trying to add them back - this is closer / almost what we want because I see the correct cores for flux-sample (yay!) however, the other jobs are unsatisfiable. [sched-fluxion-qmanager]
queue-policy = "easy"
[sched-fluxion-resource]
match-format = "$MATCH_FORMAT"
[resource]
noverify = true
norestrict = true
[queues.offline]
requires = ["offline"]
[queues.online]
requires = ["online"]
[[resource.config]]
hosts = "flux-sample[0-3]"
properties = ["online"]
[[resource.config]]
hosts = "flux-sample[0-3],burst[0-99]"
cores = "0-3"
[[resource.config]]
hosts = "burst[0-99]"
properties = ["offline"] MATCH_FORMAT=rv1 NJOBS=10 NODES/JOB=6
{
"match-format": "rv1"
}
STATE QUEUE NNODES NCORES NGPUS NODELIST
free online 4 16 0 flux-sample[0-3]
free offline 100 400 0 burst[0-99]
allocated 0 0 0
down 0 0 0
ƒVby9Ls: exception: type=alloc note=alloc denied due to type="unsatisfiable"
rv1 10 6 0.72 13.95 9718 416 196608
JOBID QUEUE USER NAME ST NTASKS NNODES TIME INFO
ƒaHGsv2 offline flux hostname CD 6 6 0.033s burst[40-45]
ƒaHGsv1 offline flux hostname CD 6 6 0.033s burst[46-51]
ƒaHGsuz offline flux hostname CD 6 6 0.032s burst[52-57]
ƒaHGsuy offline flux hostname CD 6 6 0.032s burst[58-63]
ƒaFntdg offline flux hostname CD 6 6 0.029s burst[64-69]
ƒaFntdf offline flux hostname CD 6 6 0.028s burst[70-75]
ƒaFntde offline flux hostname CD 6 6 0.025s burst[76-81]
ƒaFntdd offline flux hostname CD 6 6 0.033s burst[82-87]
ƒaEJuMH offline flux hostname CD 6 6 0.032s burst[88-93]
ƒa6txwZ offline flux hostname CD 6 6 0.032s burst[94-99]
ƒVby9Ls online flux hostname F 6 6 -
{
"t_depend": 1687906682.6507435,
"t_run": 1687906682.715299,
"t_cleanup": 1687906682.7485518,
"t_inactive": 1687906682.7660186,
"duration": 0,
"expiration": 4841506682,
"name": "hostname",
"cwd": "/tmp/workflow",
"queue": "offline",
"ntasks": 6,
"ncores": 24,
"nnodes": 6,
"priority": 16,
"ranks": "[44-49]",
"nodelist": "burst[40-45]",
"success": true,
"result": "COMPLETED",
"waitstatus": 0,
"id": 21843935235,
"t_submit": 1687906682.6404624,
"state": "INACTIVE",
"username": "flux",
"userid": 1000,
"urgency": 16,
"runtime": 0.03325295448303223,
"status": "COMPLETED",
"returncode": 0,
"dependencies": [],
"annotations": {},
"exception": {
"occurred": false
}
} Adding back the cores to the main resource section I get the weirdness about flux-sample cores again. [[resource.config]]
hosts = "flux-sample[0-3],burst[0-99]"
cores = "0-104" STATE QUEUE NNODES NCORES NGPUS NODELIST
free online 4 420 0 flux-sample[0-3]
free offline 100 10500 0 burst[0-99]
allocated 0 0 0
down 0 0 0 And a variant of the closer one - but trying to add cores to the offline spec - same outcome. [sched-fluxion-qmanager]
queue-policy = "easy"
[sched-fluxion-resource]
match-format = "$MATCH_FORMAT"
[resource]
noverify = true
norestrict = true
[queues.offline]
requires = ["offline"]
[queues.online]
requires = ["online"]
[[resource.config]]
hosts = "flux-sample[0-3]"
properties = ["online"]
[[resource.config]]
hosts = "flux-sample[0-3],burst[0-99]"
cores = "0-3"
[[resource.config]]
hosts = "burst[0-99]"
properties = ["offline"]
cores = "4-103" MATCH_FORMAT=rv1 NJOBS=10 NODES/JOB=6
{
"match-format": "rv1"
}
STATE QUEUE NNODES NCORES NGPUS NODELIST
free online 4 16 0 flux-sample[0-3]
free offline 100 10400 0 burst[0-99]
allocated 0 0 0
down 0 0 0
ƒWKUoEX: exception: type=alloc note=alloc denied due to type="unsatisfiable"
rv1 10 6 1.33 7.51 194598 552 450560
JOBID QUEUE USER NAME ST NTASKS NNODES TIME INFO
ƒaV8nAj offline flux hostname CD 6 6 0.307s burst[40-45]
ƒaTentU offline flux hostname CD 6 6 0.381s burst[46-51]
ƒaTentT offline flux hostname CD 6 6 0.381s burst[52-57]
ƒaTentS offline flux hostname CD 6 6 0.357s burst[58-63]
ƒaTentR offline flux hostname CD 6 6 0.332s burst[64-69]
ƒaTentQ offline flux hostname CD 6 6 0.309s burst[70-75]
ƒaTentP offline flux hostname CD 6 6 0.285s burst[76-81]
ƒaQgpKh offline flux hostname CD 6 6 0.258s burst[82-87]
ƒaPCq3M offline flux hostname CD 6 6 0.218s burst[88-93]
ƒaMiqm1 offline flux hostname CD 6 6 0.173s burst[94-99]
ƒWKUoEX online flux hostname F 6 6 -
{
"t_depend": 1687906880.5886397,
"t_run": 1687906881.019693,
"t_cleanup": 1687906881.3267233,
"t_inactive": 1687906881.3751292,
"duration": 0,
"expiration": 4841506880,
"name": "hostname",
"cwd": "/tmp/workflow",
"queue": "offline",
"ntasks": 6,
"ncores": 624,
"nnodes": 6,
"priority": 16,
"ranks": "[44-49]",
"nodelist": "burst[40-45]",
"success": true,
"result": "COMPLETED",
"waitstatus": 0,
"id": 21978152960,
"t_submit": 1687906880.5760133,
"state": "INACTIVE",
"username": "flux",
"userid": 1000,
"urgency": 16,
"runtime": 0.30703043937683105,
"status": "COMPLETED",
"returncode": 0,
"dependencies": [],
"annotations": {},
"exception": {
"occurred": false
}
} So that technically is closest to what we want, but we would need an override somewhere that says "allow me to schedule resources that I don't have."
yeah totally! I can wait until that is merged (I'm already watching it) and then try the above again. Sorry - I get excited about things and then dive in (and probably it might be better to wait sometimes). |
oh wait! I think I have a bug in the above - let me fix it quickly. Update: the bug was asking for NNODES (6) for the local submit (I only have 4) so I changed that to:
But I get a weird "lost contact" error - ET phone home!
And the job is reported as failed: rv1 10 6 1.37 7.32 194598 548 446464
JOBID QUEUE USER NAME ST NTASKS NNODES TIME INFO
ƒcGuuts offline flux hostname CD 6 6 0.214s burst[40-45]
ƒcFRvcZ offline flux hostname CD 6 6 0.390s burst[46-51]
ƒcFRvcY offline flux hostname CD 6 6 0.390s burst[52-57]
ƒcFRvcX offline flux hostname CD 6 6 0.378s burst[58-63]
ƒcDwwLG offline flux hostname CD 6 6 0.364s burst[64-69]
ƒcDwwLF offline flux hostname CD 6 6 0.350s burst[70-75]
ƒcDwwLE offline flux hostname CD 6 6 0.336s burst[76-81]
ƒcDwwLD offline flux hostname CD 6 6 0.310s burst[82-87]
ƒcDwwLC offline flux hostname CD 6 6 0.282s burst[88-93]
ƒcDwwLB offline flux hostname CD 6 6 0.233s burst[94-99]
ƒXrS49D online flux hostname F 1 1 0.003s flux-sample3 detail doesn't give more info:
|
In that example, the Your second attempt looks good to me:
I couldn't find the unsatisfiable job request here. Are you sure you were submitting the jobs to the |
This was a bug on my part - I was asking for 6 nodes but I only had 4. When I fixed that:
|
Ah, ok. Looks like that broker rank went away (flux-sample-3), or somehow the job-exec module otherwise got What does |
Ah, so you led me down the path to debugging this!
I realized we only are actually running the job with - flux batch -n1 ./combined/start.sh
+ flux batch -N 4 ./combined/start.sh And now - boum! $ cat overlay-status.txt
0 flux-sample-0: full
├─ 1 flux-sample-1: full
│ └─ 3 flux-sample-3: full
└─ 2 flux-sample-2: full flux@flux-sample-0:/tmp/workflow$ cat resource-status.txt
STATE UP NNODES NODELIST
avail ✔ 4 flux-sample[0-3]
avail* ✗ 100 burst[0-99] So that works! And that's the outcome we'd want for this early testing. But a question - given that I don't want to pass forward all the resources of the parent to the child instance, how would I know which subset are selected? E.g., if I do: $ flux batch -N 2 ./combined/start.sh How would I know which of flux-sample[..] to write into the broker.toml? |
Here is the current state of our experiments - https://github.com/flux-framework/flux-operator/tree/child-broker-experiment/examples/experimental/child-broker#combined I think next step is either to discuss:
I think we'd want the burst nodes to be flagged as |
There are two methods
You can't schedule a job without it attempting to be run (or simulated to be run as with the mock execution). Or do you mean submit a job and have it stay in the SCHED state (i.e. pending) while the Since the |
@grondo the
Hmm could this be a bug? I definitely removed that STATE QUEUE NNODES NCORES NGPUS NODELIST
free online 3 9 0 flux-sample-[1-3]
free offline 100 10300 0 burst[0-99]
allocated 0 0 0
down 0 0 0 0 flux-sample-1: full
├─ 1 flux-sample-2: full
└─ 2 flux-sample-3: full
STATE UP NNODES NODELIST
avail ✔ 3 flux-sample-[1-3]
avail* ✗ 100 burst[0-99] Could it be the flag Jun 28 18:22:27.869159 job-exec.err[0]: ƒ2eExs1q: exec_kill: any (rank 4294967295): No such file or directory
ƒ2c4wv4X: exception: type=exec note=lost contact with job shell on broker (null) (rank 97) And the nodes are still not in the
Yes that's exactly what we want! For bursting, we will have these potential nodes defined, and in the same way we add faux nodes to a starting broker (and can schedule a job that doesn't have current resources but doesn't fail) we want to be able to pass that on to a child broker (as in this use case). |
Oh, do you have the latest flux-sched? The bug where all ranks but 0 were marked up when running an instance
No, that attribute has nothing to do with scheduling, it just enables the mock execution implementation, which simulates a job being executed but doesn't run any job shells. |
Oup, probably not! I will rebuild my base container and try with it. |
To follow up to the discussion started here:
flux-framework/flux-sched#1009 (comment)
I'm trying to get it working to be able to run a batch job that has a combination of working (real) nodes and some that are mocked (don't work). We will eventually want the nodes that are mocked to not accept jobs, period, but that is a next step. Right now I'm trying to get one set actually running, and one not. I'm doing this work here: https://github.com/flux-framework/flux-operator/tree/child-broker-experiment/examples/experimental/child-broker#combined
I think I'm close - because I've added two queues (batch and debug, not sure why I can't name them something else?) and then put each respective group (flux-sample for real, burst for fake) to the queues, but for some reason flux thinks the batch has a lot more cores than it does! Here is the
start.sh
:and what happens:
Note that the "real" hostname submit fails, and the fake ones are ok. I think the issue is the 416 cores (indeed my local machine doesn't have that many!) So questions:
run_duration
flag? But I don't want it to fail with "unsatisfiable" I would want them to be scheduled if the mock resources could potentially support the jobs!Thank you!
The text was updated successfully, but these errors were encountered: