-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
google-batch quota error does not trigger job failure #303
Comments
Hi @rivershah , This appears to be working as intended. The idea is that the quota issue is resolvable (either by resources becoming available or user allocating more quota), and then the job continues. For example, imagine submitting 100 jobs when we only have quota to do 50. Once the first 50 finish, we'd want the next 50 to run. Perhaps better documentation on this should be added. |
This risks starvation. What is a graceful way to trigger fast failure / timeout please? For example we submit jobs on large gpu machines which can go without availability for days |
Ideally, you could make use of dsub's |
Excellent, requesting that we please implement this |
Using the
google-batch
provider, I notice that some batch errors are not propagating to dsub and it continues waiting to run jobs, when when should be abortingThe process that launched it has retries=0, yet it still shows no failure and is patiently
The text was updated successfully, but these errors were encountered: