Replies: 8 comments 22 replies
-
DATs were an early use case that motivated Flux's recursive design. The idea is that the DAT owner would be the instance owner of a Flux instance submitted with a reservation to run for some specific time period. I think we do need to circle back and nail down some details, e.g.:
I think we know how to set up reservations but the Fluxion team may need to respond to that one. |
Beta Was this translation helpful? Give feedback.
-
I guess if I can wrap my head around user jobs being single-user flux instances inside of a multi-user instance, I can wrap my head around DATs being a multi-user flux instance inside of a multi-user instance :) The question that I'd like to add to the one's that @garlick asked above is how do we do the 'advance' part of the advance reservation. i.e. can we tell flux that a job / instance needs to start on a certain set of resources at a 5:00 tomorrow, so don't schedule anything else on those resources that'll still be running then regardless of priority? I guess this might be just a combination of --begin and an 'expedite' qos (make this job/instance the highest priority) on the DAT instance. |
Beta Was this translation helpful? Give feedback.
-
A few more details/questions I think we need to answer about DATs:
|
Beta Was this translation helpful? Give feedback.
-
Once Fluxion will be able to create "allocation" into the future, jobs that cannot be backfilled cannot run. I think we are getting this for free. |
Beta Was this translation helpful? Give feedback.
-
For the first cut, I would say, we look at what users do and try to match that as the baseline. We can later extend it. Perhaps, users can do is to pass their batch jobs as the initial program to this DAT job, which would be equivalent to "presubmitting" their jobs? The initial program should not drain the queue so if users want to submit more jobs to it, they can do that. They will need to point FLUX_URI to this DAT instance though... This can be communicated to the users either via front end trick or via hotline protocol... just a though,. |
Beta Was this translation helpful? Give feedback.
-
@ryanday36: Under SLURM, do you give multiple users (each user with distinct Unix id) to be able to run during DAT by permitting them with |
Beta Was this translation helpful? Give feedback.
-
@ryanday36: how do you compare the priority of this capability with other scheduler items? Probably soon we will want to pull the issues that we want to fix for our Oct release and knowing priorities will be helpful. |
Beta Was this translation helpful? Give feedback.
-
On the Kanban that @SteVwonder put up on Thursday (https://github.com/orgs/flux-framework/projects/27), I put this in the medium priorities column. We should have a larger discussion of this at some point, but I basically prioritized things that relate to configuring clusters (node exclusivity, job / user limits) and basic functionality (the ability to modify jobs after submission) as high priority, then job submission options that get used often (--depend, --begin) as medium priority. My feeling is that we definitely want the medium priority things done before we roll out to more than friendly users as they're going to significantly color users first impression of flux. |
Beta Was this translation helpful? Give feedback.
-
I wanted to start a discussion, or at least a place for discussion, on how we're going to run DATs / DSTs (giving one-ish user exclusive access to a set of nodes for a given time) under flux. The traditional way we do this is by creating an advance reservation, which seems preferable from an operational perspective, but I could also see a possibility of doing something complicated with queues / partitions.
Beta Was this translation helpful? Give feedback.
All reactions