Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubeflow-Training Action] Update notebook's pipfile to sync with Kubeflow-Training SDK release 1.9.0 #920

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

abhijeet-dhumal
Copy link
Contributor

🚀 This is an automated Pull Request generated by odh-kfto-sdk-notebooks-sync.yml workflow.

This PR updates the Pipfile to sync with latest Kubeflow-Training SDK release.

@openshift-ci openshift-ci bot requested review from dibryant and paulovmr February 24, 2025 18:44
Copy link
Contributor

openshift-ci bot commented Feb 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign caponetto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andyatmiami
Copy link
Contributor

andyatmiami commented Feb 24, 2025

Splatting some output here from running some of my own local verification scripts:

codeserver

Baseline image size: 2183542739

Modified image size: 2183542737

datascience

Baseline image size: 3228213715

Modified image size: 3228220369

pytorch (cuda)

Baseline image size: 11329757652

Modified image size: 11329750996

pytorch (rocm)

Baseline image size: 22809433042

Modified image size: 22809439701

trustyai

Baseline image size: 11325221842

Modified image size: 11325229012

✅ There are no significant differences in image size after applying KFTO 1.9.0

@andyatmiami
Copy link
Contributor

andyatmiami commented Feb 24, 2025

As a general comment / for future discussion... (see: slack thread)

This block from pipelines team "makes me think":

This is an adaptation of a similar line from codeflare-sdk team:

In both of these logics... we are passing --pre flag to the pipenv lock command.. which is NOT a flag the IDE team passes when regenerating the lock file:

This leads to differences getting introduced.. only to then be "un-introduced" when our schedule piplock renewal action fires in the notebooks/ repo... this "whiplash" in the commit history seems unnecessary and we should probably align all this at some point in the future to avoid commits like this getting "undone":
image

as a general rule of thumb (imho) - it would seem for sake of stability/etc - we should err on the side of NOT pulling in --pre dependencies (unless there is a specific/documented reason)

however, given the codeflare-sdk dependency update solution has been in place for quite some time now - this is just an "FYI" comment and not a "call to action" in the context of this PR.

@andyatmiami
Copy link
Contributor

/lgtm

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhijeet-dhumal , is the update of requirements.txt being made from https://github.com/opendatahub-io/training-operator/pull/31/files ?
maybe i m missing the line where this changes is being made.

also i m not sure, we want outside workflow to update the requirements.txt
maybe it would be best for IDE team to update that in-house based on changes in Pipfile.
WDYT @jiridanek ? this way, we would be able to keep it stable.

Copy link
Contributor

openshift-ci bot commented Feb 24, 2025

@abhijeet-dhumal: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror ce30db5 link true /test notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/rocm-notebooks-e2e-tests ce30db5 link true /test rocm-notebooks-e2e-tests
ci/prow/codeserver-notebook-e2e-tests ce30db5 link true /test codeserver-notebook-e2e-tests
ci/prow/notebooks-ubi9-e2e-tests ce30db5 link true /test notebooks-ubi9-e2e-tests
ci/prow/images ce30db5 link true /test images
ci/prow/notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror ce30db5 link true /test notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror ce30db5 link true /test notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@abhijeet-dhumal
Copy link
Contributor Author

@abhijeet-dhumal , is the update of requirements.txt being made from https://github.com/opendatahub-io/training-operator/pull/31/files ? maybe i m missing the line where this changes is being made.

also i m not sure, we want outside workflow to update the requirements.txt maybe it would be best for IDE team to update that in-house based on changes in Pipfile. WDYT @jiridanek ? this way, we would be able to keep it stable.

@harshad16 If I understand correctly, The shell script generate_code.sh run is responsible for updating requirements.txt file 👀
https://github.com/opendatahub-io/training-operator/pull/31/files#diff-43b9eca4300953f4eef18a388f4aaace196f07559af6f0d11ead836f6f81b074R143
cc: @jiridanek

@harshad16
Copy link
Member

If I understand correctly, The shell script generate_code.sh run is responsible for updating requirements.txt file 👀 https://github.com/opendatahub-io/training-operator/pull/31/files#diff-43b9eca4300953f4eef18a388f4aaace196f07559af6f0d11ead836f6f81b074R143

ah , right, totally missed it.
It makes sense now on why it was added, thanks for pointing me to this.

can you take a look at the --pre flag ? if it not intended, can you remove and recompile this file ?

@abhijeet-dhumal
Copy link
Contributor Author

If I understand correctly, The shell script generate_code.sh run is responsible for updating requirements.txt file 👀 https://github.com/opendatahub-io/training-operator/pull/31/files#diff-43b9eca4300953f4eef18a388f4aaace196f07559af6f0d11ead836f6f81b074R143

ah , right, totally missed it. It makes sense now on why it was added, thanks for pointing me to this.

can you take a look at the --pre flag ? if it not intended, can you remove and recompile this file ?

@harshad16 hey, If I try running workflow without --pre flag, it raises below issue, have you seen it before? :


pipenv.exceptions.ResolutionFailure: ERROR: No matching distribution found for 
torch~=2.4.0
✘ Locking Failed!
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this mechanism, then 
run $ pipenv graph to inspect the versions actually installed in the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: Failed to lock Pipfile.lock!
Failed to lock dependencies

@jiridanek
Copy link
Member

jiridanek commented Feb 25, 2025

Run pipenv with --verbose. Change it in the workflow also. You want --verbose output.

edit:

no, there's any additional --verbose,

ERROR: No matching distribution found for 
torch~=2.4.0

is as verbose as it's going to get

@jiridanek
Copy link
Member

jiridanek commented Feb 25, 2025

Here's how the locking goes if I do

gh pr checkout 920, push only the Pipfile changes without locking, and let our gha do locking with make refresh-pipfilelock-files

everything seems to lock fine

@abhijeet-dhumal
Copy link
Contributor Author

abhijeet-dhumal commented Feb 25, 2025

Using cleaned commit history, I updated version in only pipfiles
and ran make command as suggested here : #920 (comment)
but... : https://privatebin.corp.redhat.com/?c8a16431184d1426#CQTxkApiWRNBVL2qYpCu9uH3mz5jXEMQCmC3LH54kvtb
@jiridanek pointed out that the issue is with macos which I'm using, I will test it again tomorrow 👍
In the meantime can someone please raise PR to update kfto-sdk version manually ?

@jiridanek
Copy link
Member

jiridanek commented Feb 25, 2025

In the meantime can someone please raise PR to update kfto-sdk version manually ?

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants