Using the kubernetes job api instead of pods #1112

kannon92 · 2022-06-23T13:32:32Z

kannon92
Jun 23, 2022

Hello,

There is a large push to enhance the Kubernetes job API in the Kubernetes batch working group. https://docs.google.com/document/d/12biwrj1vmovR-vSKOtSh7DTUahdGcrSzQNhqgrotrVM/edit#heading=h.xgjl2srtytjt

In general, they identified that the issue with Volcano and other batch schedulers is that they are supporting their own way of specifying jobs. They acknowledge that the Job API may not be ready but it seems that they are moving towards adding functionality that the Job API was missing.

They added a Kueue CRD for queueing that treats the Job API as a first class citizen.

Can we get some thoughts on what functionality was missing from the Jobs API and why we are using Pods/PodSpecs instead?

Should we include in our roadmap switching to using the job api instead of pods?

severinson · 2022-06-23T13:46:13Z

severinson
Jun 23, 2022
Collaborator

I think we've previously had issues with etcd load.

With k8s jobs, we'd create a job and a job controller is responsible for creating pods for that job. So for each job, the job controller needs to watch for (subscribe to) updates on a record in etcd and reconcile changes to that record with the k8s cluster state.
If the number of jobs is large, the number of such subscriptions puts a lot of load on etcd.

By creating pods directly, we reduce the load on etcd significantly.

The problem is that the job throughput we have is very large, so solutions that use etcd as their main storage (which is essentially all of the k8s systems I've seen) struggle.

2 replies

severinson Jun 23, 2022
Collaborator

That said, we could still allow users to submit k8s jobs and then create the pods ourselves instead of via an existing job controller.

kannon92 Jun 23, 2022
Author

That is a good idea.

severinson · 2022-06-23T13:48:22Z

severinson
Jun 23, 2022
Collaborator

I've seen efforts to standardise a PodGroup notion too. What's going the state of jobs vs. PodGroups in K8s?

See here for discussion on PodGroups:
https://groups.google.com/a/kubernetes.io/g/wg-batch

0 replies

kannon92 · 2022-06-24T13:48:02Z

kannon92
Jun 24, 2022
Author

From @michelsumbul,

In response to why GR choose to go with their own custom api for submitting jobs:

the original idea for armada was to have a job scheduler that could schedule on top of k8s but also potnetially other platform like YARN.
At that time GR was not yet sure that K8S will be here for long. so they decided to get a specific api to potentially abstract multiple compute platform.
The armada api was also to stay close to how condor submit job

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the kubernetes job api instead of pods #1112

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Using the kubernetes job api instead of pods #1112

kannon92 Jun 23, 2022

Replies: 3 comments · 2 replies

severinson Jun 23, 2022 Collaborator

severinson Jun 23, 2022 Collaborator

kannon92 Jun 23, 2022 Author

severinson Jun 23, 2022 Collaborator

kannon92 Jun 24, 2022 Author

kannon92
Jun 23, 2022

Replies: 3 comments 2 replies

severinson
Jun 23, 2022
Collaborator

severinson Jun 23, 2022
Collaborator

kannon92 Jun 23, 2022
Author

severinson
Jun 23, 2022
Collaborator

kannon92
Jun 24, 2022
Author