[Core feature] Allow setting metadata (labels&annotations) individually on all K8s task types #6238
Open
2 tasks done
Labels
enhancement
New feature or request
Motivation: Why do you think this is important?
Today, as a Flyte user, I have the following options to set labels/annotations on the pods/CRD objects of K8s tasks in a flyte workflow execution:
Set via pod template:
This sets labels/annotations on the pods of individual tasks.
For distributed tasks (like pytorch, ray, ...) this sets the metadata not on the CRD object but its pod template spec.
Set via
pyflyte run --labels ... --annotations ...
This applies the metadata to all K8s objects in a flyte workflow execution, including task pods and task CRD objects. However, this mechanism doesn't work on an individual task level.
As a Flyte user, I would like to be able to specify specific labels/annotations for individual k8s task CRD objects like pytorch jobs, ray job, ... (the same way I already can today for pods via the pod template):
I propose to use the same syntax/flyteidl type for all K8s (non-pod) plugins like
Elastic
,TfJob
,MpiJob
,RayJobConfig
, ...In my concrete case, I would like to have this feature in order to leverage Kueue to gang schedule worker pods for distributed pytorch training tasks (e.g. as documented here).
This requires setting a queue name annotation on the underlying PytorchJob CRD object.
There have been previous asks from the community to enable such a feature/integration:
Attempts to integrate Yunikorn and Kueue more deeply into flytepropeller which weren't accepted though.
In contrast, the feature I propose allows users to choose to use Kueue while it isn't a Flyte-Kueue integration. Instead it is a very general feature that could be used for any other integration as well that makes use of annotations/labels to select workloads.
There have been discussions in Slack about using Kueue, suggesting to use e.g.
pyflyte run --labels/--annotations
to set the required metadata. However, this is not good enough because this applies the metadata to all nodes in the graph while you might want to do queueing/gang scheduling only for a subset.Describe alternatives you've considered
Add task kwargs for labels and annotations:
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: