[Kubeflow-Training Action] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #920

abhijeet-dhumal · 2025-02-24T18:43:51Z

🚀 This is an automated Pull Request generated by odh-kfto-sdk-notebooks-sync.yml workflow.

This PR updates the Pipfile to sync with latest Kubeflow-Training SDK release.

Signed-off-by: abhijeet-dhumal <[email protected]>

openshift-ci · 2025-02-24T18:44:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign caponetto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andyatmiami · 2025-02-24T20:35:28Z

Splatting some output here from running some of my own local verification scripts:

codeserver

Baseline image size: 2183542739

Modified image size: 2183542737

datascience

Baseline image size: 3228213715

Modified image size: 3228220369

pytorch (cuda)

Baseline image size: 11329757652

Modified image size: 11329750996

pytorch (rocm)

Baseline image size: 22809433042

Modified image size: 22809439701

trustyai

Baseline image size: 11325221842

Modified image size: 11325229012

✅ There are no significant differences in image size after applying KFTO 1.9.0

andyatmiami · 2025-02-24T20:39:10Z

As a general comment / for future discussion... (see: slack thread)

This block from pipelines team "makes me think":

https://github.com/opendatahub-io/training-operator/blob/add-kfto-sdk-notebooks-sync/.github/workflows/odh-kfto-sdk-notebooks-sync.yaml#L81-L84

This is an adaptation of a similar line from codeflare-sdk team:

https://github.com/project-codeflare/codeflare-sdk/blob/main/.github/workflows/odh-notebooks-sync.yml#L74-L78

In both of these logics... we are passing --pre flag to the pipenv lock command.. which is NOT a flag the IDE team passes when regenerating the lock file:

This leads to differences getting introduced.. only to then be "un-introduced" when our schedule piplock renewal action fires in the notebooks/ repo... this "whiplash" in the commit history seems unnecessary and we should probably align all this at some point in the future to avoid commits like this getting "undone":

as a general rule of thumb (imho) - it would seem for sake of stability/etc - we should err on the side of NOT pulling in --pre dependencies (unless there is a specific/documented reason)

however, given the codeflare-sdk dependency update solution has been in place for quite some time now - this is just an "FYI" comment and not a "call to action" in the context of this PR.

andyatmiami · 2025-02-24T20:40:13Z

/lgtm

harshad16

@abhijeet-dhumal , is the update of requirements.txt being made from https://github.com/opendatahub-io/training-operator/pull/31/files ?
maybe i m missing the line where this changes is being made.

also i m not sure, we want outside workflow to update the requirements.txt
maybe it would be best for IDE team to update that in-house based on changes in Pipfile.
WDYT @jiridanek ? this way, we would be able to keep it stable.

openshift-ci · 2025-02-24T22:44:11Z

@abhijeet-dhumal: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror	`ce30db5`	link	true	`/test notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror`
ci/prow/rocm-notebooks-e2e-tests	`ce30db5`	link	true	`/test rocm-notebooks-e2e-tests`
ci/prow/codeserver-notebook-e2e-tests	`ce30db5`	link	true	`/test codeserver-notebook-e2e-tests`
ci/prow/notebooks-ubi9-e2e-tests	`ce30db5`	link	true	`/test notebooks-ubi9-e2e-tests`
ci/prow/images	`ce30db5`	link	true	`/test images`
ci/prow/notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror	`ce30db5`	link	true	`/test notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror`
ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror	`ce30db5`	link	true	`/test notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

abhijeet-dhumal · 2025-02-25T05:40:35Z

@abhijeet-dhumal , is the update of requirements.txt being made from https://github.com/opendatahub-io/training-operator/pull/31/files ? maybe i m missing the line where this changes is being made.

also i m not sure, we want outside workflow to update the requirements.txt maybe it would be best for IDE team to update that in-house based on changes in Pipfile. WDYT @jiridanek ? this way, we would be able to keep it stable.

@harshad16 If I understand correctly, The shell script generate_code.sh run is responsible for updating requirements.txt file 👀
https://github.com/opendatahub-io/training-operator/pull/31/files#diff-43b9eca4300953f4eef18a388f4aaace196f07559af6f0d11ead836f6f81b074R143
cc: @jiridanek

harshad16 · 2025-02-25T06:02:25Z

If I understand correctly, The shell script generate_code.sh run is responsible for updating requirements.txt file 👀 https://github.com/opendatahub-io/training-operator/pull/31/files#diff-43b9eca4300953f4eef18a388f4aaace196f07559af6f0d11ead836f6f81b074R143

ah , right, totally missed it.
It makes sense now on why it was added, thanks for pointing me to this.

can you take a look at the --pre flag ? if it not intended, can you remove and recompile this file ?

abhijeet-dhumal · 2025-02-25T09:35:28Z

If I understand correctly, The shell script generate_code.sh run is responsible for updating requirements.txt file 👀 https://github.com/opendatahub-io/training-operator/pull/31/files#diff-43b9eca4300953f4eef18a388f4aaace196f07559af6f0d11ead836f6f81b074R143

ah , right, totally missed it. It makes sense now on why it was added, thanks for pointing me to this.

can you take a look at the --pre flag ? if it not intended, can you remove and recompile this file ?

@harshad16 hey, If I try running workflow without --pre flag, it raises below issue, have you seen it before? :


pipenv.exceptions.ResolutionFailure: ERROR: No matching distribution found for 
torch~=2.4.0
✘ Locking Failed!
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this mechanism, then 
run $ pipenv graph to inspect the versions actually installed in the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: Failed to lock Pipfile.lock!
Failed to lock dependencies

jiridanek · 2025-02-25T10:12:49Z

Run pipenv with --verbose. Change it in the workflow also. You want --verbose output.

edit:

no, there's any additional --verbose,

ERROR: No matching distribution found for 
torch~=2.4.0

is as verbose as it's going to get

jiridanek · 2025-02-25T12:03:00Z

Here's how the locking goes if I do

gh pr checkout 920, push only the Pipfile changes without locking, and let our gha do locking with make refresh-pipfilelock-files

everything seems to lock fine

[Kubeflow-Training] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #921

abhijeet-dhumal · 2025-02-25T14:07:27Z

Using cleaned commit history, I updated version in only pipfiles
and ran make command as suggested here : #920 (comment)
but... : https://privatebin.corp.redhat.com/?c8a16431184d1426#CQTxkApiWRNBVL2qYpCu9uH3mz5jXEMQCmC3LH54kvtb
@jiridanek pointed out that the issue is with macos which I'm using, I will test it again tomorrow 👍
In the meantime can someone please raise PR to update kfto-sdk version manually ?

jiridanek · 2025-02-25T14:09:21Z

In the meantime can someone please raise PR to update kfto-sdk version manually ?

[Kubeflow-Training] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #921 should be just that

openshift-merge-robot · 2025-02-26T22:51:19Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Updated notebooks via odh-sync-updater-13505252594 GitHub action

ce30db5

Signed-off-by: abhijeet-dhumal <[email protected]>

openshift-ci bot requested review from dibryant and paulovmr February 24, 2025 18:44

openshift-ci bot added the size/xxl label Feb 24, 2025

This was referenced Feb 24, 2025

[Kubeflow-Training Action] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #911

Closed

RHOAIENG-20205: update kubeflow-training sdk version to 1.9.0 #905

Closed

openshift-ci bot assigned andyatmiami Feb 24, 2025

openshift-ci bot added the lgtm label Feb 24, 2025

harshad16 reviewed Feb 24, 2025

View reviewed changes

abhijeet-dhumal requested a review from harshad16 February 25, 2025 05:41

jiridanek mentioned this pull request Feb 25, 2025

[Kubeflow-Training] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #921

Merged

3 tasks

openshift-merge-robot added the needs-rebase label Feb 26, 2025

abhijeet-dhumal marked this pull request as draft February 27, 2025 05:42

openshift-ci bot added the do-not-merge/work-in-progress label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kubeflow-Training Action] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #920

[Kubeflow-Training Action] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #920

abhijeet-dhumal commented Feb 24, 2025

openshift-ci bot commented Feb 24, 2025

andyatmiami commented Feb 24, 2025 •

edited

Loading

andyatmiami commented Feb 24, 2025 •

edited

Loading

andyatmiami commented Feb 24, 2025

harshad16 left a comment

openshift-ci bot commented Feb 24, 2025

abhijeet-dhumal commented Feb 25, 2025

harshad16 commented Feb 25, 2025

abhijeet-dhumal commented Feb 25, 2025

jiridanek commented Feb 25, 2025 •

edited

Loading

jiridanek commented Feb 25, 2025 •

edited

Loading

abhijeet-dhumal commented Feb 25, 2025 •

edited

Loading

jiridanek commented Feb 25, 2025 •

edited

Loading

openshift-merge-robot commented Feb 26, 2025

[Kubeflow-Training Action] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #920

Are you sure you want to change the base?

[Kubeflow-Training Action] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #920

Conversation

abhijeet-dhumal commented Feb 24, 2025

openshift-ci bot commented Feb 24, 2025

andyatmiami commented Feb 24, 2025 • edited Loading

andyatmiami commented Feb 24, 2025 • edited Loading

andyatmiami commented Feb 24, 2025

harshad16 left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Feb 24, 2025

abhijeet-dhumal commented Feb 25, 2025

harshad16 commented Feb 25, 2025

abhijeet-dhumal commented Feb 25, 2025

jiridanek commented Feb 25, 2025 • edited Loading

jiridanek commented Feb 25, 2025 • edited Loading

abhijeet-dhumal commented Feb 25, 2025 • edited Loading

jiridanek commented Feb 25, 2025 • edited Loading

openshift-merge-robot commented Feb 26, 2025

andyatmiami commented Feb 24, 2025 •

edited

Loading

andyatmiami commented Feb 24, 2025 •

edited

Loading

jiridanek commented Feb 25, 2025 •

edited

Loading

jiridanek commented Feb 25, 2025 •

edited

Loading

abhijeet-dhumal commented Feb 25, 2025 •

edited

Loading

jiridanek commented Feb 25, 2025 •

edited

Loading