Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild for CUDA 12 #39

Merged

Conversation

regro-cf-autotick-bot
Copy link
Contributor

@regro-cf-autotick-bot regro-cf-autotick-bot commented Jun 6, 2023

This PR has been triggered in an effort to update cuda120.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.


Here are some more details about this specific migrator:

The transition to CUDA 12 SDK includes new packages for all CUDA libraries and build tools. Notably, the cudatoolkit package no longer exists, and packages should depend directly on the specific CUDA libraries (libcublas, libcusolver, etc) as needed. For an in-depth overview of the changes and to report problems see this issue. Please feel free to raise any issues encountered there. Thank you! 🙏


If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/5192514335, please use this URL for debugging.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham
Copy link
Member

Seeing the following error on CI:

CMake Error at /home/conda/feedstock_root/build_artifacts/kaldi-split_1686080997559/_build_env/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find NvToolExt (missing: NvToolExt_INCLUDE_DIR
  NvToolExt_LIBRARIES

As the new CUDA 12 packages split all these components up and standard conda-forge Docker images are used ( conda-forge/staged-recipes#21382 ), think we need to add cuda-nvtx to host:

  host:
    - cuda-nvtx      # [(cuda_compiler_version or "").startswith("12")]

Edit: For more info about the CUDA 12 bringup, please see issue ( conda-forge/conda-forge.github.io#1963 )

@mmcauliffe
Copy link
Contributor

Trying to push the fixes to the bot branch is giving me an access denied error. Are there any other extra permissions that I need in order to do it? From @h-vetinari's comments on https://matrix.to/#/!SOyumkgPRWoXfQYIFH:matrix.org/$168168775831YVYfK:gitter.im?via=matrix.org&via=gitter.im&via=cadair.com, it seems like having accepted the team invites should let me push to this branch, but is there something else I'm missing?

@jakirkham
Copy link
Member

@conda-forge-admin, please update team

@jakirkham
Copy link
Member

jakirkham commented Jun 8, 2023

Actually just tried that in an issue ( #40 ), which worked

Edit: Maybe give it another try?

@mmcauliffe
Copy link
Contributor

Still getting a permission denied error from local machine, tried editing it via codespaces and it gave the following warning:

You don't have write access to the regro-cf-autotick-bot/kaldi-feedstock repository, so you cannot push changes to it.

@jakirkham
Copy link
Member

Could you please share how you are trying to push?

Also do you use SSH with git / GitHub?

@mmcauliffe
Copy link
Contributor

Yes, it's using SSH with git, both through GitKraken with my github account and manually adding the remote in command line. I tried using HTTPS, but it gave an error "Support for password authentication was removed on August 13, 2021".

@jakirkham
Copy link
Member

Ok, could you please share the command lines run?

@mmcauliffe
Copy link
Contributor

So even:

git remote add bot [email protected]:regro-cf-autotick-bot/kaldi-feedstock.git
git fetch bot

gives

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

@h-vetinari
Copy link
Member

So even: [...] git fetch bot gives

This is almost certainly a misconfiguration issue on your end. Fetching from a remote has nothing to do with your maintainer rights, it should always be possible.

@mmcauliffe
Copy link
Contributor

@jakirkham Ok got the git issues sorted, but it looks like it's still having the same issue even with cuda-nvtx in host. Do you know if there's any interaction with cmake's find_package for CUDAToolkit and CUDA (relevant lines here: https://github.com/kaldi-asr/kaldi/blob/master/CMakeLists.txt#L169)?

@mmcauliffe
Copy link
Contributor

Hmm, I added:


        - libcublas-dev      # [(cuda_compiler_version or "").startswith("12")]
        - libcufft-dev      # [(cuda_compiler_version or "").startswith("12")]
        - libcurand-dev      # [(cuda_compiler_version or "").startswith("12")]
        - libcusolver-dev      # [(cuda_compiler_version or "").startswith("12")]
        - libcusparse-dev      # [(cuda_compiler_version or "").startswith("12")]

to host as well, following conda-forge/conda-forge.github.io#1963, but it's still throwing the same error.

recipe/meta.yaml Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

Think we need to add this to the cmake calls in the build scripts:

    -DNvToolExt_INCLUDE_DIR:PATH="${PREFIX}/include"

@mmcauliffe
Copy link
Contributor

Ok I added

-DNvToolExt_INCLUDE_DIR:PATH="${PREFIX}/include" \
-DNvToolExt_LIBRARIES:PATH="${PREFIX}/lib" \

But now it's giving the error of:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUDA_LIBRARY (ADVANCED)

It looks like these issues are fixed via something like -DCMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs in floydhub/dl-docker#48 for instance, but CMAKE_LIBRARY_PATH should be properly set. Did the location of cuda change or is there some other reason it can't find the cuda library?

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry looks like it needs to be NvToolExt_ROOT_DIR

recipe/build_kaldi.sh Outdated Show resolved Hide resolved
recipe/build_kaldi.bat Outdated Show resolved Hide resolved
@mmcauliffe
Copy link
Contributor

Hmm, using just -DNvToolExt_ROOT_DIR:PATH="${PREFIX}" gives the old error of:

Could NOT find NvToolExt (missing: NvToolExt_INCLUDE_DIR
  NvToolExt_LIBRARIES)

@mmcauliffe
Copy link
Contributor

Ok so it looks like the include files from nvtx-c are installed to include/nvtx3, and there's only libnvToolsExt.so.1 installed via cuda-nvtx, no libnvToolsExt.so, but after updating FindNvToolExt.cmake, we're still back to the issue with CUDA_CUDA_LIBRARY being not found. Is there an issue with CUDA 12 not exporting the same files to lib/include as prior versions?

@jakirkham
Copy link
Member

Think we want to only use cuda-nvtx with CUDA 12. Think we need to drop nvtx-c (having both may be confusing and CUDA 11 has this in the Docker image already)

We may need to tinker with the search path for CUDA 12. Will try to include some suggestions for this a bit later

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had tried setting the path to the NVTX targets directory (as explained below). However that didn't seem to work. Providing this context in case you know of a better way to try configuring Kaldi with this information

recipe/build_kaldi.sh Outdated Show resolved Hide resolved
recipe/build_kaldi.sh Outdated Show resolved Hide resolved
@mmcauliffe
Copy link
Contributor

@jakirkham I can play around a bit more with it, but I guess I'm still confused why cuda-nvtx is installing files outside of the normal ${PREFIX}/lib and ${PREFIX}/include, which seems like a fundamental assumption of where files are installed on conda. Is it really necessary for them to be installed into a platform specific target folder? Will this not cause issues on install with the ${PREFIX}/targets/x86_64-linux not being automatically included in the PATH like ${PREFIX}? This might just be me not fully understanding how conda works internally.

@jakirkham
Copy link
Member

This is better explained in issue ( conda-forge/conda-forge.github.io#1963 ). Please read over that and ask any general questions on that info in that issue

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict.
Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@mmcauliffe mmcauliffe requested a review from jakirkham July 29, 2023 00:06
@mmcauliffe
Copy link
Contributor

@jakirkham ok I think I have everything building correctly, had to add a few more dependencies and ensure that the lib/stubs folder was on the search path for cmake. Can you take a look over for sanity?

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing through on this @mmcauliffe! Just one nit about the build-script. I'd also like you to squash everything into two commits - your changes in one, and the rerender in another. If you want I can help with that.

recipe/build_kaldi.sh Show resolved Hide resolved
@h-vetinari
Copy link
Member

h-vetinari commented Jul 29, 2023

I'd also like you to squash everything into two commits - your changes in one, and the rerender in another. If you want I can help with that.

It might be slightly more than two after all. But less than 29. ;-)
(I'd especially like to avoid adding things that we're removing again immediately after; unless those changes tell a story that remains relevant even after this PR).

@mmcauliffe mmcauliffe force-pushed the rebuild-cuda120-0-1_ha381ca branch from 3c50d7c to 8d11b04 Compare July 29, 2023 05:18
@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict.
Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

@mmcauliffe mmcauliffe force-pushed the rebuild-cuda120-0-1_ha381ca branch from 8d11b04 to 9f7fe4d Compare July 29, 2023 05:20
@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@mmcauliffe
Copy link
Contributor

Ok, I think I squashed them right, but let me know if I should have done it a different way! I also rebased onto the latest main commit, and I think everything should pass still.

The transition to CUDA 12 SDK includes new packages for all CUDA libraries and
build tools. Notably, the cudatoolkit package no longer exists, and packages
should depend directly on the specific CUDA libraries (libblas, libcusolver,
etc) as needed. For an in-depth overview of the changes and to report problems
[see this issue]( conda-forge/conda-forge.github.io#1963 ).
Please feel free to raise any issues encountered there. Thank you! 🙏
@h-vetinari h-vetinari force-pushed the rebuild-cuda120-0-1_ha381ca branch from 9f7fe4d to 608c6d1 Compare July 29, 2023 06:26
@h-vetinari
Copy link
Member

h-vetinari commented Jul 29, 2023

Ok, I think I squashed them right, but let me know if I should have done it a different way!

It was pretty good already, thanks! The only thing I though would be cleaner to leave in the commits of the migrator. I reinstated those1, and I also separated out the commit aimed at #41, and rerendered again at the end. Otherwise I didn't change anything.

Footnotes

  1. the committer still changed because we rebased on main.

@h-vetinari h-vetinari force-pushed the rebuild-cuda120-0-1_ha381ca branch from 608c6d1 to 4c0322c Compare July 29, 2023 06:39
@h-vetinari h-vetinari force-pushed the rebuild-cuda120-0-1_ha381ca branch from 4c0322c to 9e42107 Compare July 29, 2023 06:46
@h-vetinari
Copy link
Member

Sorry for the repeated force-pushes, something weird was happening with the rerender, and it seems the new conda build that just got released does a different (=better) job. The last rerender now does a much more focused job too. :)

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now LGTM! Thanks a lot for your persistence on this! :)

@mmcauliffe mmcauliffe merged commit e482f55 into conda-forge:main Jul 29, 2023
@regro-cf-autotick-bot regro-cf-autotick-bot deleted the rebuild-cuda120-0-1_ha381ca branch July 29, 2023 08:32
@jakirkham
Copy link
Member

Generally this looks reasonable. Thank you both for working on it! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants