Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sysroot 2.17 on CUDA < 11.8. #745

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,16 @@ dependencies:
- gcc<14.0.0
- output_types: conda
matrices:
- matrix:
arch: x86_64
cuda: "11.[2456]"
packages:
- sysroot_linux-64==2.17
- matrix:
arch: aarch64
cuda: "11.[2456]"
packages:
- sysroot_linux-aarch64==2.17
Comment on lines +64 to +68
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM is not compatible with GLIBC 2.17

$ strings /opt/conda/envs/cuda11/lib/libcurand.so | grep GLIBC
GLIBC_2.17
GLIBC_2.27
GLIBC_2.17

So think we should stick to GLIBC 2.28 here

Suggested change
- matrix:
arch: aarch64
cuda: "11.[2456]"
packages:
- sysroot_linux-aarch64==2.17
- matrix:
arch: aarch64
cuda: "11.[2456]"
packages:
- sysroot_linux-aarch64==2.28

Copy link
Contributor Author

@bdice bdice Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. We can equivalently just erase this and let it use the matrix entry below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that looks cleaner. Let's do it 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops: this is not correct. https://anaconda.org/conda-forge/nvcc_linux-aarch64/files?version=11.4

The aarch64 packages for nvcc 11.4 (up to 11.7) require sysroot 2.17. The current state of the PR (using sysroot 2.17) is correct for all nvcc versions < 11.8, and is the same on x86 and ARM.

Copy link
Contributor Author

@bdice bdice Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that CUDA libraries require glibc versions newer than 2.17 -- but the nvcc compiler metapackage for CUDA 11 is only installable with sysroot 2.17 until we get to recent builds of nvcc 11.8 that permit newer sysroots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakirkham I am going to merge this as-is so that we can unblock nightly CI.

I think there's nothing else we should do here, but if you want to lift the restrictions for old CUDA compilers, a follow-up PR would be welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I had looked into this recently. Here is what I observed:

Typically that is enough to justify a repodata patch and it makes sense

Additionally noticed that nvcc_{{ target_platform }}'s pinnings could be handled more systematically, which is now fixed

Following this submitted a repodata patch

The repodata patch landed before last night's nightly run, which passed in CUDA 11.4 (the affected job)

Note there is a CUDA 12.5 failure in that nightly, but that is due to actual test failures

The following tests FAILED:
	570 - install_relocatable-with-gtest_discover_tests-ninja_configure (Failed)
	571 - install_relocatable-with-gtest_discover_tests-ninja (Not Run) ninja test
Errors while running CTest

So AFAICT this issue is already fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful. I apologize, I didn't see those PRs, just the end of the Slack conversation, and thought this might have been the best we could do. I appreciate you taking the extra steps, which now allow us to simplify things.

I filed a follow-up to simplify the dependencies.yaml and use sysroot 2.28 everywhere in #749.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry as well. I meant to update this thread with those details and just hadn't had time to do so. The CUDA 12.8 rollout started very soon after

If you are feeling stuck, please feel free to DM me. Am here to help

- matrix:
arch: x86_64
packages:
Expand Down
Loading