Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Allow setting metadata (labels&annotations) individually on all K8s task types #6238

Open
2 tasks done
fg91 opened this issue Feb 10, 2025 · 3 comments
Open
2 tasks done
Assignees
Labels
enhancement New feature or request

Comments

@fg91
Copy link
Member

fg91 commented Feb 10, 2025

Motivation: Why do you think this is important?

Today, as a Flyte user, I have the following options to set labels/annotations on the pods/CRD objects of K8s tasks in a flyte workflow execution:

  1. Set via pod template:

    def task(
        pod_template=PodTemplate(annotations=..., labels=...)
    )

    This sets labels/annotations on the pods of individual tasks.
    For distributed tasks (like pytorch, ray, ...) this sets the metadata not on the CRD object but its pod template spec.

  2. Set via pyflyte run --labels ... --annotations ...

    This applies the metadata to all K8s objects in a flyte workflow execution, including task pods and task CRD objects. However, this mechanism doesn't work on an individual task level.

As a Flyte user, I would like to be able to specify specific labels/annotations for individual k8s task CRD objects like pytorch jobs, ray job, ... (the same way I already can today for pods via the pod template):

@task(
    task_config=PyTorch(
        num_workers=...,
        ...
        # Proposed addition:
        metadata=ObjectMeta(
            annotations={"kueue.x-k8s.io/queue-name": "queue-name"},
            labels={...}
        )
    )
)

I propose to use the same syntax/flyteidl type for all K8s (non-pod) plugins like Elastic, TfJob, MpiJob, RayJobConfig, ...


In my concrete case, I would like to have this feature in order to leverage Kueue to gang schedule worker pods for distributed pytorch training tasks (e.g. as documented here).
This requires setting a queue name annotation on the underlying PytorchJob CRD object.

There have been previous asks from the community to enable such a feature/integration:

  • Attempts to integrate Yunikorn and Kueue more deeply into flytepropeller which weren't accepted though.

    In contrast, the feature I propose allows users to choose to use Kueue while it isn't a Flyte-Kueue integration. Instead it is a very general feature that could be used for any other integration as well that makes use of annotations/labels to select workloads.

  • There have been discussions in Slack about using Kueue, suggesting to use e.g. pyflyte run --labels/--annotations to set the required metadata. However, this is not good enough because this applies the metadata to all nodes in the graph while you might want to do queueing/gang scheduling only for a subset.

Describe alternatives you've considered

Add task kwargs for labels and annotations:

@task(
    # If we added these args ...
    labels={...},
    annotations={...),
    # ... for simple python function tasks this would conflict with this existing arg:
    pod_template=PodTemplate(annotations=...)
)

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@fg91 fg91 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Feb 10, 2025
@fg91 fg91 changed the title [Core feature] Enable gang scheduling on GKE with Kueue by allowing to set metadata on distributed job resources [Core feature] Allow setting metadata (labels&annotations) on all K8s task types Feb 11, 2025
@fg91 fg91 changed the title [Core feature] Allow setting metadata (labels&annotations) on all K8s task types [Core feature] Allow setting metadata (labels&annotations) individually on all K8s task types Feb 11, 2025
@fg91 fg91 self-assigned this Feb 11, 2025
@fg91
Copy link
Member Author

fg91 commented Feb 13, 2025

Notes contributors' sync:
Agree on precedence when metadata set in task_config conflicts with metadata set e.g. via pyflyte run.

@fg91
Copy link
Member Author

fg91 commented Feb 13, 2025

@eapolinario there cannot be a conflict with metadata set via @task(pod_template=...) because that is applied to the CRD object's pod template spec and not the object itself.
But there can be a conflict with pyflyte run --annotations ... --labels ....

I would say when a user specifies metadata via pyflyte run, this should have precedence. But I'm open to implementing it either way. Do you have a strong opinion?

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Feb 20, 2025
@eapolinario
Copy link
Contributor

I would say when a user specifies metadata via pyflyte run, this should have precedence.

I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

2 participants