-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sysroot 2.17 on CUDA < 11.8. #745
Use sysroot 2.17 on CUDA < 11.8. #745
Conversation
/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bradley! 🙏
Think CUDA 11 ARM still needs 2.28. Have added a suggestion below
As the CI failure was only on x86_64, think this would be ok
- matrix: | ||
arch: aarch64 | ||
cuda: "11.[2456]" | ||
packages: | ||
- sysroot_linux-aarch64==2.17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM is not compatible with GLIBC 2.17
$ strings /opt/conda/envs/cuda11/lib/libcurand.so | grep GLIBC
GLIBC_2.17
GLIBC_2.27
GLIBC_2.17
So think we should stick to GLIBC 2.28 here
- matrix: | |
arch: aarch64 | |
cuda: "11.[2456]" | |
packages: | |
- sysroot_linux-aarch64==2.17 | |
- matrix: | |
arch: aarch64 | |
cuda: "11.[2456]" | |
packages: | |
- sysroot_linux-aarch64==2.28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. We can equivalently just erase this and let it use the matrix entry below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that looks cleaner. Let's do it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops: this is not correct. https://anaconda.org/conda-forge/nvcc_linux-aarch64/files?version=11.4
The aarch64 packages for nvcc 11.4 (up to 11.7) require sysroot 2.17. The current state of the PR (using sysroot 2.17) is correct for all nvcc versions < 11.8, and is the same on x86 and ARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that CUDA libraries require glibc versions newer than 2.17 -- but the nvcc
compiler metapackage for CUDA 11 is only installable with sysroot 2.17
until we get to recent builds of nvcc 11.8
that permit newer sysroots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakirkham I am going to merge this as-is so that we can unblock nightly CI.
I think there's nothing else we should do here, but if you want to lift the restrictions for old CUDA compilers, a follow-up PR would be welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I had looked into this recently. Here is what I observed:
- The overly strict
sysroot
constraint was flagged as a bug - A fix was made to relax it
- The old versions don't have this fix, but could benefit from the same treatment
Typically that is enough to justify a repodata patch and it makes sense
Additionally noticed that nvcc_{{ target_platform }}
's pinnings could be handled more systematically, which is now fixed
Following this submitted a repodata patch
The repodata patch landed before last night's nightly run, which passed in CUDA 11.4 (the affected job)
Note there is a CUDA 12.5 failure in that nightly, but that is due to actual test failures
The following tests FAILED:
570 - install_relocatable-with-gtest_discover_tests-ninja_configure (Failed)
571 - install_relocatable-with-gtest_discover_tests-ninja (Not Run) ninja test
Errors while running CTest
So AFAICT this issue is already fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful. I apologize, I didn't see those PRs, just the end of the Slack conversation, and thought this might have been the best we could do. I appreciate you taking the extra steps, which now allow us to simplify things.
I filed a follow-up to simplify the dependencies.yaml
and use sysroot 2.28
everywhere in #749.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry as well. I meant to update this thread with those details and just hadn't had time to do so. The CUDA 12.8 rollout started very soon after
If you are feeling stuck, please feel free to DM me. Am here to help
Merging to unblock nightly CI, a follow-up may be created later.
This removes some special cases added in #745 around sysroot pinnings. Those were needed for CUDA 11.4 until a recent repodata patch. Thanks @jakirkham! Authors: - Bradley Dice (https://github.com/bdice) Approvers: - https://github.com/jakirkham URL: #749
Description
Nightly CI was failing on CUDA 11.4 due to #741. This fixes it.
Old versions of
nvcc
for CUDA 11 do not support sysroot 2.28 until CUDA 11.8.Checklist
cmake-format.json
is up to date with these changes.include_guard(GLOBAL)
)