-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enforce wheel size limits, README formatting in CI #114
Conversation
ci/validate_wheel.sh
Outdated
python -m pip install \ | ||
'pydistcheck==0.8.*' \ | ||
twine | ||
|
||
pyenv rehash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I was seeing CI failures like this in this PR:
./ci/validate_wheel.sh: line 11: pydistcheck: command not found
Think I see the root cause.
pydistcheck
is available from the rapidsai/ci-wheel
Docker image: rapidsai/ci-imgs#205
This project does all of its CUDA 12 builds using CUDA 12.0.1.
pynvjitlink/.github/actions/compute-matrix/action.yaml
Lines 18 to 25 in 666375a
export BUILD_MATRIX=" | |
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.10', LINUX_VER: 'rockylinux8' } | |
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.11', LINUX_VER: 'rockylinux8' } | |
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.12', LINUX_VER: 'rockylinux8' } | |
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.10', LINUX_VER: 'rockylinux8' } | |
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.11', LINUX_VER: 'rockylinux8' } | |
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.12', LINUX_VER: 'rockylinux8' } | |
" |
And therefore ends up pulling rapidsai/ci-wheel:cuda12.0.1-rockylinux8-py3.12
(build link).
We stopped building CUDA 12.0.1 ci-wheel
images in rapidsai/ci-imgs#194... and so no new images with that tag was published with pydistcheck
in it. The latest rapidsai/ci-wheel:cuda12.0.1-rockylinux8-py3.12
was pushed about a month ago: https://hub.docker.com/r/rapidsai/ci-wheel/tags?name=cuda12.0.1-rockylinux8-py3.12
Assuming pynvjitlink
needs to keep building on CUDA 12.0.1 for compatibility reasons, options I see:
- add back 12.0.1 CI images in https://github.com/rapidsai/ci-imgs
- install
pydistcheck
andtwine
at runtime of CI like this
For the purpose of these validation checks, just installing it at runtime should be fine... they're small libraries with few dependencies, that are quick to install.
But having this repo's CI rely on pulling images that are no longer getting updated risks it missing out on other updates (like security patches to libraries in the base image), and on the need to do more special-casing of this repo to use other things newly installed in ci-wheel
images.
@bdice @KyleFromNVIDIA what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if it’s important for this repo to have size checks. The lowest friction thing is probably to update the CI image tags. I don’t think we’ll have issues from that, please give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's helpful to have size checks for anything we're publishing to PyPI, even if we feel it's unlikely they'll grow much... they're cheap maintenance-wise, and help avoid surprises at release time.
PR to add back the 12.0.1 images: rapidsai/ci-imgs#206
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I meant we could use something newer than 12.0.1 in this repo, not add back the 12.0.1 images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhhhhh I'm sorry, misunderstood what "update the CI image tags" meant! Sure, let me try that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I tried updating the matrix to "all builds on CUDA 12.5.1, tests on a mix of CUDA 12.0.1 and 12.5.1". Looks like the 2 test jobs on CUDA 12.5.1 are failing:
FAIL: test_atomic_add_double_global_3 (numba.cuda.tests.cudapy.test_atomics.TestCudaAtomics.test_atomic_add_double_global_3)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/tests/cudapy/test_atomics.py", line 669, in test_atomic_add_double_global_3
self.assertCorrectFloat64Atomics(cuda_func, shared=False)
File "/opt/conda/envs/test/lib/python3.11/site-packages/numba/cuda/tests/cudapy/test_atomics.py", line 572, in assertCorrectFloat64Atomics
self.assertIn(f'{inst}.add.f64', asm)
AssertionError: 'red.add.f64' not found in '***BIG STRING***'
@bdice could you help me with this? I'm not familiar with the compatibility story for this library, so not sure if failures like that are expected.
I hope we can get wheel size checks running here, but if this would take a lot of effort to debug and if you also want to avoid having a pip install twine pydistcheck
on every wheel-building job (for reasons like rapidsai/shared-actions#22 (comment)), then I'd be fine with skipping this repo for the purpose of this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have a good idea of where to start debugging that. @gmarkall may have to help. I think I would prefer to just skip this repository, I don't think we gain much from adding the size checks right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright yeah let's skip it, this seems like a good place to cut scope in favor of all the other things we're trying to land in this release.
Let's close this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put up an issue about it: #116
I still think it'd be good to return to at some point.
Description
Contributes to rapidsai/build-planning#110
Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.