-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial higher limits and default behavior for private tool resource restrictions #132
Conversation
@@ -69,7 +78,7 @@ | |||
restrict-private-tool | |||
(assoc :implementation (ensure-default-implementation user implementation)) | |||
persistence/add-tool)] | |||
(containers/add-tool-container tool-id (restrict-private-tool-container container)) | |||
(containers/add-tool-container tool-id (restrict-private-tool-container (set-private-tool-defaults container))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic gist here is: set-private-tool-defaults
should set anything that isn't provided to the (lower) default limits, where restrict-private-tool-container
has been repurposed to restrict things down to the higher maximum limits. I've kept it so the latter will still default the values to the default limits, but I wanted to be slightly more explicit, particularly because ultimately we may want to move restrict-private-tool-container
to another namespace and apply it to all tools, not just private ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent was to restrict private tools to a maximum of the apps.tools.private.memory-limit
config (16GB), whether the user provided no limit, or if the provided limit was higher than this configured max.
So is it now the intent to allow the user to set a higher limit for private tools, up to the apps.tools.memory-limit
max for all tools (32GB)? Or do we still want to restrict private tools to a lower limit than public tools (of course, an admin should still be able to lift the limit to the apps.tools.memory-limit
max)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all tools would be subject to the same effective maximum was my understanding, yeah -- with the prior limits for private tools becoming the defaults instead (and we'll also be hiding this slightly behind something saying it's advanced behaviour). Not much benefit to adding an interface for this unless it can be adjusted both up and down, and I don't think we want to bring the defaults down (though we could -- VICE defaults to 2 cores and I forget how much memory, but I think less than the current private tool defaults). It's also worth noting that the trajectory here is towards this being only the max, adding a field also for the minimum (default whatever value we consider acceptable as the smallest slot), and then allowing users to select a value between them at runtime. Once again there, we'd probably peg the defaults somewhere like these older values, I think
This PR is part of #123 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spot anything that you missed. If you are crazy, then I'm apparently at least as crazy. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, but I had a question about the intent or policy of resource restrictions for private tools vs. all tools.
@@ -69,7 +78,7 @@ | |||
restrict-private-tool | |||
(assoc :implementation (ensure-default-implementation user implementation)) | |||
persistence/add-tool)] | |||
(containers/add-tool-container tool-id (restrict-private-tool-container container)) | |||
(containers/add-tool-container tool-id (restrict-private-tool-container (set-private-tool-defaults container))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent was to restrict private tools to a maximum of the apps.tools.private.memory-limit
config (16GB), whether the user provided no limit, or if the provided limit was higher than this configured max.
So is it now the intent to allow the user to set a higher limit for private tools, up to the apps.tools.memory-limit
max for all tools (32GB)? Or do we still want to restrict private tools to a lower limit than public tools (of course, an admin should still be able to lift the limit to the apps.tools.memory-limit
max)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if we decided to allow users to (eventually) set their private tools to use as much RAM/CPU as public tools, then these changes LGTM 👍
After discussion elsewhere I think I'm going to close this and rework it. I think the plan is:
Future work will allow also setting minimums/maximums for tools, allow choosing a value when launching analyses, and possibly adding recommended/minimum/maximum values at the app level as well. But to start it'll just be tool-level recommended values and configurable maximums resolved in the apps service. |
Please include max upperbounds for allowable data size in gb. This should
be used to match appropriate execute node via class add in condos. Also
include coy,gpu architecture as optional fields
…On Mon, Nov 12, 2018, 4:26 PM Ian McEwen ***@***.***> wrote:
After discussion elsewhere I think I'm going to close this and rework it.
I think the plan is:
- Add configurations much like this PR, for system maximums (and
later, minimums maybe), private tool (and maybe VICE) maximums, and
defaults for different types of tools.
- set up the DB to have columns for tools' "recommended" values,
copying what's currently in the maximum fields to that one, and using that
(for now) for setting limits
- change it so apps always resolves the limits to real values to pass
down (even for public tools -- use the default values if stuff is unset,
otherwise use those values)
- change defaults that currently are set elsewhere so that they yell
more loudly if they're used, since they shouldn't be now
Future work will allow also setting minimums/maximums for tools, allow
choosing a value when launching analyses, and possibly adding
recommended/minimum/maximum values at the app level as well. But to start
it'll just be tool-level recommended values and configurable maximums
resolved in the apps service.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#132 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AQBc83ZFJDmdUnf4kTGGfJ8X2siGBrAkks5uue8ZgaJpZM4YRDGT>
.
_______________________________________________
Core-sw mailing list
***@***.***
https://maillist.cyverse.org/mailman/listinfo/core-sw
|
We haven't really fleshed out a way to limit disk/data size, so we'd only really be able to add some sort of "advisory" limit, or something that kills jobs outright if they exceed the size. We also haven't yet figured out GPU integration stuff at all, and probably anything dealing with that architecture and/or CPU architecture would need to be designed and done at once. Basically, what I've laid out is the first steps, and I think what you've said goes in the "future work" section :) |
Yes, disk size can be advisory. We should have the ability to calculate
size for user provided inputs
I would recommend a catch all place to add class add we can put on a job
and those are appended i.e users never see them. This will allow us to
provide fine grain control (as we get GPU in Jan/Feb or if there are
specific CPU extensions needed for optimization)
Remind me ..do you guys already have ability to add group name to a job
..so we can reserve set of nodes for a class/workshop etc.
…On Mon, Nov 12, 2018 at 5:11 PM Ian McEwen ***@***.***> wrote:
We haven't really fleshed out a way to limit disk/data size, so we'd only
really be able to add some sort of "advisory" limit, or something that
kills jobs outright if they exceed the size. We also haven't yet figured
out GPU integration stuff at all, and probably anything dealing with that
architecture and/or CPU architecture would need to be designed and done at
once. Basically, what I've laid out is the first steps, and I think what
you've said goes in the "future work" section :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#132 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AQBc8xJrYKB4YevfgGrCi73wGZQFB8Ceks5uug4IgaJpZM4YRDGT>
.
_______________________________________________
Core-sw mailing list
***@***.***
https://maillist.cyverse.org/mailman/listinfo/core-sw
|
Yeah, we already add all the groups the submitting user is part of to the classad. We don't currently support any sort of "launch as a member of X group" feature, so you're always considered as a member of every group you're part of, but that's quite sufficient for a class or workshop probably. |
And, the way it's currently set up, we basically have a template that gets used to create the submit file, which has access to the full job definition. Most likely we don't benefit from doing much more than that, since we don't want to encode details of our condor configuration in our high-level models or vice versa. |
Agreed ..and we want abstraction i.e submit to condor, kubernetes, agave
etc.
…On Mon, Nov 12, 2018 at 6:33 PM Ian McEwen ***@***.***> wrote:
And, the way it's currently set up, we basically have a template that gets
used to create the submit file, which has access to the full job
definition. Most likely we don't benefit from doing much more than that,
since we don't want to encode details of our condor configuration in our
high-level models or vice versa.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#132 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AQBc8zoUNZ4UazczWv8aAskxBrKYsQiIks5uuiFQgaJpZM4YRDGT>
.
_______________________________________________
Core-sw mailing list
***@***.***
https://maillist.cyverse.org/mailman/listinfo/core-sw
|
Excellent, that works for now. We will need to expose it at some point in
the future.
…On Mon, Nov 12, 2018 at 6:26 PM Ian McEwen ***@***.***> wrote:
Yeah, we already add all the groups the submitting user is part of to the
classad. We don't currently support any sort of "launch as a member of X
group" feature, so you're always considered as a member of every group
you're part of, but that's quite sufficient for a class or workshop
probably.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#132 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AQBc8-bx1rkigCB3pgl9YWeF5DPtf9S8ks5uuh-zgaJpZM4YRDGT>
.
_______________________________________________
Core-sw mailing list
***@***.***
https://maillist.cyverse.org/mailman/listinfo/core-sw
|
This is a pretty small PR with work I did weeks ago, but coming back to it I'm not actually sure there's more that needs changing in
apps
. I'd appreciate a check from those who work a bit more with the apps code to make sure I'm not crazy and there's not more needed here!