Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling #1319

Open
bkmgit opened this issue Apr 12, 2021 · 5 comments
Open

Autoscaling #1319

bkmgit opened this issue Apr 12, 2021 · 5 comments

Comments

@bkmgit
Copy link

bkmgit commented Apr 12, 2021

Slurm supports autoscaling which is very helpful for cloud deployments. Might this be something that can be included and made relatively easy to configure?

@ChrisDowning
Copy link
Contributor

Hi @bkmgit - we have a "Cloud Working Group" tackling this right now. The goal is to have a cloud equivalent of the current on-premises recipes, but skipping the un-needed parts (Warewulf/xCAT) and instead dealing with automated scale up/down of compute nodes, as well as other considerations (which instance types make sense, what storage to use, etc).

I'll drop another message here when there is something for you to try out.

@bkmgit
Copy link
Author

bkmgit commented Apr 12, 2021

@ChrisDowning Thanks. Mailing list may be a helpful thing to have as indicated at openhpc/cloudwg#13
Hopefully community contributions on the design and implementation of the cloud equivalent will also be considered. This may possibly also be useful for HPC Carpentry hpc-carpentry/coordination#42

@sjpb
Copy link

sjpb commented Apr 15, 2021

@ChrisDowning I'd definitely be interested in hearing what happens too. We have done proof-of-concept work on Slurm autoscaling using OpenHPC before, although it's not ready for production.

@ChrisDowning
Copy link
Contributor

@sjpb Great - will keep you in the loop. I've deployed auto-scaling using Slurm power-saving for customers a few times over the last ~18 months, just never using the OpenHPC build. Deploying the same basic functionality using the OpenHPC Slurm package is pretty trivial, so we need to just get it documented first then move on to the "best practices" and other considerations people might not be aware of if they are new to cloud.

Copy link

github-actions bot commented Aug 3, 2024

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants