Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for specifying execution cluster labels in pyflyte #2422

Merged
merged 2 commits into from
Jun 21, 2024

Conversation

va6996
Copy link
Contributor

@va6996 va6996 commented May 15, 2024

Tracking issue

NA

Why are the changes needed?

Today we support the execution cluster label for specifying the execution cluster on an execution level. However there is no way to do that using the python sdk/pyflyte. This change adds support for the above mentioned.

What changes were proposed in this pull request?

We add another flag to pyflyte to allow users to specify the cluster label. Additionally when launching a wf, this info is passed to flyteadmin via the sdk.

How was this patch tested?

This patch was tested on a local instance of flyte. Test cases were modified to validate the changes.

Setup process

pyflyte run --ecl my-label mywf.py wf

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

welcome bot commented May 15, 2024

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@Tom-Newton
Copy link
Contributor

Tom-Newton commented May 16, 2024

Thanks for making the PR. I have not looked at it in detail but I think we need to upgrade the required version of flyteidl. I think at least version v1.11.1-b1 is needed here

"flyteidl>=1.11.0b1",

Older versions seem to be missing the execution_cluster_label field on ExecutionSpec https://github.com/flyteorg/flyte/blob/f5a4ce0deabdddd16ea1c18fb9b44ac0be4f831f/flyteidl/protos/flyteidl/admin/execution.proto#L336

@fg91 fg91 self-requested a review May 16, 2024 10:06
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was halfway through implementing the exact same thing because we need this as well when I figured I should check whether somebody else is working on this given that the backend recently added support for this.

So thank you for having added this functionality to flytekit 🙏

I tested your PR in a GCP-based multi-cluster deployment and can confirm that the cluster selection works. (For the record: I didn't use pyflyte but FlyteRemote directly.)

I proposed a minor rewording suggestion for the doc strings, not blocking though, so LGTM after addressing @Tom-Newton 's comment about the flyteidl version.

@@ -239,7 +239,7 @@ class RunLevelParams(PyFlyteParams):
)
limit: int = make_click_option_field(
click.Option(
param_decls=["--limit", "limit"],
param_decls=["--limit"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: bit of scope creep ;)

Still valid of course.

flytekit/clis/sdk_in_container/run.py Outdated Show resolved Hide resolved
flytekit/models/execution.py Outdated Show resolved Hide resolved
flytekit/remote/remote.py Outdated Show resolved Hide resolved
flytekit/remote/remote.py Outdated Show resolved Hide resolved
flytekit/remote/remote.py Outdated Show resolved Hide resolved
flytekit/remote/remote.py Outdated Show resolved Hide resolved
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw that tests aren't passing yet, can you please fix this and request another review?

Copy link

codecov bot commented May 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.04%. Comparing base (779c912) to head (d8aaa8c).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2422      +/-   ##
==========================================
+ Coverage   71.81%   76.04%   +4.22%     
==========================================
  Files         183      182       -1     
  Lines       18553    18565      +12     
  Branches     3652     3654       +2     
==========================================
+ Hits        13324    14117     +793     
+ Misses       4587     3821     -766     
+ Partials      642      627      -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -256,6 +256,16 @@ class RunLevelParams(PyFlyteParams):
help="Assign newly created execution to a given cluster pool",
)
)
execution_cluster_label: str = make_click_option_field(
click.Option(
param_decls=["--execution-cluster-label", "-ecl"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
param_decls=["--execution-cluster-label", "-ecl"],
param_decls=["--execution-cluster-label", "--ecl"],

Should there be two hyphens? See e.g. ["--envvars", "--env"].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to use one similar to how you'd have for --version = -v? Lmk what you think. -ecl should work too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the existing args ...

param_decls=["--envvars", "--env"],
param_decls=["--raw-output-data-prefix", "--raw-data-prefix"],
param_decls=["-r", "--remote"],

... I'd say currently we only use a single hyphen if it is followed by a single character like -v or -r. If multiple characters follow like --env, we use two hyphens.

I don't have strong opinions about this but I'd propose to just stick with -- here as well.

@davidmirror-ops
Copy link

@eapolinario any idea of how to fix the CI check?

@eapolinario
Copy link
Collaborator

@davidmirror-ops , CI failure is being fixed in #2477

@fg91
Copy link
Member

fg91 commented Jun 13, 2024

@davidmirror-ops , CI failure is being fixed in #2477

@va6996 the fix has been merged. Could you please rebase? After that, the PR can be approved from my side.

@fg91 fg91 self-requested a review June 20, 2024 13:47
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, will be very useful 🙏

@davidmirror-ops
Copy link

@eapolinario @cosmicBboy @wild-endeavor @pingsutw if any of you can review, we can unblock this PR. Thanks!

@eapolinario eapolinario merged commit b06d33e into flyteorg:master Jun 21, 2024
49 of 50 checks passed
Copy link

welcome bot commented Jun 21, 2024

Congrats on merging your first pull request! 🎉

@Future-Outlier
Copy link
Member

Hi, @va6996 I can't run this PR on the remote flyte cluster.
Can you help me check if this is a bug on flytekit?

from flytekit.loggers import logger
from flytekit import ImageSpec, Secret, task, workflow
from flytekit import Resources

@task(requests=Resources(cpu="800m", mem="900Mi"))
def say_hello() -> str:
    return "hello"

@workflow
def wf() -> str:
    return say_hello()

if __name__ == "__main__":
    print(f"Running say_hello() {say_hello()}")
    # rgister(wf, endpoint="localhost:30081", project="flyteworkshop", domain="development", name="say_hello")
image

@va6996
Copy link
Contributor Author

va6996 commented Jun 22, 2024

Hi, @va6996 I can't run this PR on the remote flyte cluster. Can you help me check if this is a bug on flytekit?

Could you please send the stack trace so I could debug? I wonder if flyteidl/flyteadmin needs to be updated to understand execution_cluster_label? Could you also run pip show flyteidl and check the version?

@Future-Outlier
Copy link
Member

Could you please send the stack trace so I could debug? I wonder if flyteidl/flyteadmin needs to b

Flyte, flytekit and flyteidl version

flyte version: master branch with latest version
flytekit version: master branch with latest version
flyteidl version: master branch with latest version

(dev) future@outlier ~ % pip list | grep -iE flyte   
flyteidl                                 1.11.0b2.dev39+g8fc782f25 /Users/future-outlier/code/flyte/flyteidl
flytekit                                 1.12.4.dev19+g876877abd

Example

above

Stack Trace Error Message

(dev) future@outlier ~ % pyflyte --verbose run --remote task_example.py wf

Running Execution on Remote.
Verbose mode on
╭────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ /Users/future-outlier/miniconda3/envs/dev/bin/pyflyte:8 in <module>                                                   │
│                                                                                                                       │
│ ❱ 8sys.exit(main())                                                                                              │
│                                                                                                                       │
│ /Users/future-outlier/miniconda3/envs/dev/lib/python3.9/site-packages/click/core.py:1157 in __call__                  │
│                                                                                                                       │
│ ❱ 1157 │   │   return self.main(*args, **kwargs)                                                                      │
│                                                                                                                       │
│ /Users/future-outlier/miniconda3/envs/dev/lib/python3.9/site-packages/rich_click/rich_command.py:126 in main          │
│                                                                                                                       │
│ ❱ 126 │   │   │   │   │   rv = self.invoke(ctx)                                                                       │
│                                                                                                                       │
│ /Users/future-outlier/code/flytekit/flytekit/clis/sdk_in_container/utils.py:148 in invoke                             │
│                                                                                                                       │
│ ❱ 148 │   │   │   │   raise e                                                                                         │
│                                                                                                                       │
│ /Users/future-outlier/code/flytekit/flytekit/clis/sdk_in_container/utils.py:142 in invoke                             │
│                                                                                                                       │
│ ❱ 142 │   │   │   return super().invoke(ctx)                                                                          │
│                                                                                                                       │
│ /Users/future-outlier/miniconda3/envs/dev/lib/python3.9/site-packages/click/core.py:1688 in invoke                    │
│                                                                                                                       │
│ ❱ 1688 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))                                    │
│                                                                                                                       │
│ /Users/future-outlier/miniconda3/envs/dev/lib/python3.9/site-packages/click/core.py:1688 in invoke                    │
│                                                                                                                       │
│ ❱ 1688 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))                                    │
│                                                                                                                       │
│ /Users/future-outlier/miniconda3/envs/dev/lib/python3.9/site-packages/click/core.py:1688 in invoke                    │
│                                                                                                                       │
│ ❱ 1688 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))                                    │
│                                                                                                                       │
│ /Users/future-outlier/miniconda3/envs/dev/lib/python3.9/site-packages/click/core.py:1434 in invoke                    │
│                                                                                                                       │
│ ❱ 1434 │   │   │   return ctx.invoke(self.callback, **ctx.params)                                                     │
│                                                                                                                       │
│ /Users/future-outlier/miniconda3/envs/dev/lib/python3.9/site-packages/click/core.py:783 in invoke                     │
│                                                                                                                       │
│ ❱  783 │   │   │   │   return __callback(*args, **kwargs)                                                             │
│                                                                                                                       │
│ /Users/future-outlier/code/flytekit/flytekit/clis/sdk_in_container/run.py:580 in _run                                 │
│                                                                                                                       │
│ ❱ 580 │   │   │   │   run_remote(                                                                                     │
│                                                                                                                       │
│ /Users/future-outlier/code/flytekit/flytekit/clis/sdk_in_container/run.py:448 in run_remote                           │
│                                                                                                                       │
│ ❱ 448execution = remote.execute(                                                                                 │
│                                                                                                                       │
│ /Users/future-outlier/code/flytekit/flytekit/remote/remote.py:1337 in execute                                         │
│                                                                                                                       │
│ ❱ 1337 │   │   │   return self.execute_remote_wf(                                                                     │
│                                                                                                                       │
│ /Users/future-outlier/code/flytekit/flytekit/remote/remote.py:1518 in execute_remote_wf                               │
│                                                                                                                       │
│ ❱ 1518 │   │   return self.execute_remote_task_lp(                                                                    │
│                                                                                                                       │
│ /Users/future-outlier/code/flytekit/flytekit/remote/remote.py:1479 in execute_remote_task_lp                          │
│                                                                                                                       │
│ ❱ 1479 │   │   return self._execute(                                                                                  │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: _execute() got an unexpected keyword argument 'execution_cluster_label'

@Future-Outlier
Copy link
Member

Hi, @va6996 I can't run this PR on the remote flyte cluster. Can you help me check if this is a bug on flytekit?

Could you please send the stack trace so I could debug? I wonder if flyteidl/flyteadmin needs to be updated to understand execution_cluster_label? Could you also run pip show flyteidl and check the version?

Wait, maybe there's some error on my setup, Kevin can run it correctly

bgedik pushed a commit to bgedik/flytekit that referenced this pull request Jul 3, 2024
fiedlerNr9 pushed a commit that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants