Replies: 2 comments 4 replies
-
I'll post this in sections of response (and threads) to organize points. This first post is about the terms at the top - how to describe changing vs not. The number of terms - rigid vs evolving vs. modable vs malleable vs elastic I know we can simplify into a few. Maybe just rigid and elastic. What do the others add? I think what we might do is define rigid and elastic, and then define cases for elasticity:
In all cases, the workload manager has to orchestrate. The distinction with case 3 is the workload manager is "shuffling around" resources to make things fair, and the application has to be able to support that (hence needing its permission to checkpoint and restore). |
Beta Was this translation helpful? Give feedback.
-
A new thread for some notes from Toward convergence in job schedulers for parallel supercomputers. Most of these are probably redundant definitions, but I might add to it. I do want to find newer papers because this one is 2 decades old, and (at least I'd hope) there are newer approaches (not just theory, but paired with implementation). Throughput
and
There are the variables of human perception (it just needs to feel fast) and then the reality that the number of actual users of a system isn't really approaching infinity. I wonder if the user not being able to know the number of other users (and their place or priority in the queue) is important here, since a large part of that "is it fast enough so I am happy" depends on that? For multiple users, in terms of Flux, we get high throughput with instances. The implication with that approach is that a single user is launching a huge number of jobs (and wants it done quickly). But why are they launching so many jobs? Doesn't that hint at an issue with the design of the jobspec, what scale it is ingesting per unit submit? The problem with this demonstration of throughput is that (I don't think) it actually maps to multi-tenancy because (I think) it would assume one top level instance owner that can then use children. It also assumes people are actually submitting jobs like that, and at those numbers. Is this the case? If someone wants to submit all those tiny jobs, what are they actually doing (and why couldn't it be encompassed in processes under one job)? adaptive partitioning
vs. dynamic partitioning
And "One common heuristic for dynamic partitioning is to strive for equal sized partitions (usually called "equipartitioning") space slicing
"An alternative is to use folding. With folding, the number of processors allocated to a job can only grow or shrink by factors of 2. gang schedulingThis jumped out at me because (along with coscheduling) it's a term that cloud uses too
Migration
Probably we need this same concept but for across clusters. Checkpoint and restart? Change job execution order
Maybe users should be rewarded not just for providing resource requirements, but getting them right. You aren't punished if you don't (getting cookies taken away) but you don't get any new cookies.
This is an interesting idea - would people be willing to accept better resources for a longer wait time? Or the inverse? We might want to consider the idea of user policies that define these preferences '- 'You can move me to a less optimal CPU for my job to run faster" Or what if the time on a scheduler is like an auction - each person is given some amount of time, maybe depending on funding, etc., and they are allowed to trade, sell, etc. Could it be like a market? And reach a more optimal state? Or maybe we need new models - a race where the fastest percentile gets bonus resources, and likely they will have better cognitive health and be more effective too. Or whomever gives the sysadmins the most pizza. 🍕 Clusters of assumptionsThese are 20 years old, should there be new ones? When we design this descriptive stuff, what is our cost function anyway? |
Beta Was this translation helpful? Give feedback.
-
I think the wall of text that follows is both ideas and design. Since it would be nice to get a design out of it, I tagged the discussion as such.
Motivation
To lay a foundation for work on resource and task dynamism in a multi-cluster environment, I’d like to discuss concepts and specifications needed to specify and support moldable, evolving, malleable, and elastic jobs in Flux. With respect to some of our past selves and our Flux forebears, here’s a link to a really great and informative discussion on related topics back in 2015(!): #354. I can’t say that I’ve read every comment, but I encourage everyone reading this to review that discussion first.
Definitions
So we’re on the same page, here are modified and updated definitions of those terms based on those found in the now classic Feitelson and Rudolph, 2005:
A rigid job requires a fixed number of resources and shape in order to execute, as specified by the user at job submission.
An evolving job is one that may change its resource requirements during execution. Note that it is the application itself that initiates the changes.
A moldable job is one that can be initiated with variable resources. The resources are determined by the resource manager before job execution.
A malleable job is one that can adapt to changes (initiated by the resource manager) in its resources during execution.
(My definition) An elastic job is one that features dynamism described by the three previous job types in combination and/or dynamism in time.
I updated those definitions to generalize what changes from the number of processors to resources and their shapes. In my opinion there’s a lack of precision in discussions about characteristics and behaviors related to elasticity. Hopefully the ensuing discussion can also serve to clarify those characteristics and behaviors.
New terminology and needs
In the following bullets, I’ve taken some of the graph terminology from Diestel’s Graph Theory (Fifth Edition). I don’t think we need to get into detailed discussion on graphs (if so; cool!), but I’d like to anchor some of the items on well-defined concepts from graph theory. I say that with an eye toward research.
Those are some initial ideas to get us started. I don’t mean to imply any of these ideas are required or that I’m set on the terminology I just used. Feel free to propose other terms or redirect the discussion in a more concrete direction. I’ll get to work on preparing some YAML/JSON examples of the items above for motivation.
While not necessary for a conceptual discussion, use cases are very welcome and will help to ground our thoughts.
Beta Was this translation helpful? Give feedback.
All reactions