Replies: 3 comments 2 replies
-
I think we've previously had issues with etcd load. With k8s jobs, we'd create a job and a job controller is responsible for creating pods for that job. So for each job, the job controller needs to watch for (subscribe to) updates on a record in etcd and reconcile changes to that record with the k8s cluster state. By creating pods directly, we reduce the load on etcd significantly. The problem is that the job throughput we have is very large, so solutions that use etcd as their main storage (which is essentially all of the k8s systems I've seen) struggle. |
Beta Was this translation helpful? Give feedback.
-
I've seen efforts to standardise a See here for discussion on PodGroups: |
Beta Was this translation helpful? Give feedback.
-
From @michelsumbul, In response to why GR choose to go with their own custom api for submitting jobs:
|
Beta Was this translation helpful? Give feedback.
-
Hello,
There is a large push to enhance the Kubernetes job API in the Kubernetes batch working group. https://docs.google.com/document/d/12biwrj1vmovR-vSKOtSh7DTUahdGcrSzQNhqgrotrVM/edit#heading=h.xgjl2srtytjt
In general, they identified that the issue with Volcano and other batch schedulers is that they are supporting their own way of specifying jobs. They acknowledge that the Job API may not be ready but it seems that they are moving towards adding functionality that the Job API was missing.
They added a Kueue CRD for queueing that treats the Job API as a first class citizen.
Can we get some thoughts on what functionality was missing from the Jobs API and why we are using Pods/PodSpecs instead?
Should we include in our roadmap switching to using the job api instead of pods?
Beta Was this translation helpful? Give feedback.
All reactions