Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda error rtx3090, 3080 LHR #105

Open
Kinzo1011 opened this issue Nov 18, 2021 · 5 comments
Open

Cuda error rtx3090, 3080 LHR #105

Kinzo1011 opened this issue Nov 18, 2021 · 5 comments

Comments

@Kinzo1011
Copy link

Unexpected error CUDA error in func set_constants at line 180 calling cudaMemcpyToSymbol(d_dag, &_dag, sizeof(hash64_t*)) failed with error invalid device symbol on CUDA device 09:00.0

@SirSquoll
Copy link

SirSquoll commented Dec 15, 2021

I'm getting this error as well when I run the miner except device 0a:00:0.

OS: Ubuntu 20.04
GPU: RTX 3080 LHR
Nvidia driver: 495.44
CUDA version: 11.5
I tried rolling back to 470 to see if maybe the CUDA version would rollback but that was a mistake (reintroduced a shared library error) and I should've known it wasn't going to be that simple.

This is the same error that was supposedly/hopefully fixed in May with the current release. From what I can gather it's because it's missing the CUDA 11 support that the RTX 30 series requires but this update was supposed to specifically be for them so I'm not sure what happened.

3+ hours of trying to build it from the github clone and fixing a few errors (of note is that it would not compile properly until I downgraded GCC from version 9 to version 7 and had to make explicit path references in the cmake command for libcuda.so and libnvrtc which were installed in the default install location), now this error, my head hurts heh, might take another look tomorrow with a fresh mind.

Maybe start from scratch again with the old 460 driver that was out in early-mid May when this was released with an early CUDA version (see section 1.1 tables, CUDA 11 was introduced in 450, https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html).

Earlier report, testing and feedback requested but none given so it was closed and unpinned, I wonder if the latest CUDA 11 version is causing grief?
#54

@fdoving
Copy link

fdoving commented Dec 15, 2021

@fdoving fdoving changed the title Cuda rtx3090 Cuda error rtx3090, 3080 LHR Dec 15, 2021
@SirSquoll
Copy link

SirSquoll commented Dec 15, 2021

At first, yes, but that gave me problems right off the bat as well. Then I tried cloning (described above) and ran in to multiple errors but was able to work through them with the help of this write-up (edit: forgot link https://xangis.com/building-kawpowminer-from-source-on-ubuntu-20-04-linux/). I was able to compile everything ok but upon running the miner I got the error that is the subject of this thread and after a few minutes gave up for the night after posting my comment above. I just tried the file you linked again though.

The executable gives an "error while loading shared libraries: libnvrtc.so.11.2: cannot open shared object file: No such file or directory" message. This file is present in one of the directories of my Folding@Home installation and that application works as expected. It seems to have been included in the installation files though. When I try copying it over to the extracted directory holding the binary of the extracted files I get the same error but for libnvrtc.so.10.1.

@fdoving
Copy link

fdoving commented Dec 15, 2021

It is built for cuda 11.2, try that.

I will look into building for newer cuda versions, or making it completely static.

@SirSquoll
Copy link

SirSquoll commented Dec 16, 2021

I am happy to report positive results but still not sure why it works now (not complaining haha):

  • Installed nvidia-driver-460 via Dolphin/Additional Drivers
  • nvidia-smi showed driver 460.106.00/CUDA Version 11.2
  • The executable gave error ./kawpowminer: error while loading shared libraries: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
  • Deleted all clone and build files, recloned from scratch from git, and followed the Readme instructions.
  • cmake resulted in a single error this time but different than the previous errors I described. It was CMake Error at /usr/share/cmake-3.16/Modules/FindCUDA.cmake:707 (message): Specify CUDA_TOOLKIT_ROOT_DIR Call Stack (most recent call first): libhwmon/CMakeLists.txt:19 (find_package)
  • nvcc --version returned nothing and said to install nvidia-cuda-toolkit via apt-get
  • apt list nvidia-cuda-toolkit returned nvidia-cuda-toolkit/focal,now 10.1.243-3 amd64 [residual-config] which was not 11.2 as you specified above.
  • Went to Nvidia's website and used their wget/dpkg instructions in the archives to install CUDA Toolkit 11.2 except I got confused and ran both commands for V11.5.
  • Noticed my error and re-ran them for V11.2 but when I finished running the installation instructions much to my dismay I saw 11.5 being installed. Afterwards nvidia-smi showed driver 495.29.05 and CUDA 11.5, essentially the same as what I had installed when I made my initial report yesterday except I had driver 495.44.
  • Tried running the cloned miner again and to my surprise it ran. For 5 minutes before 'SIGSEGV encounted' resulted in essentially a crash.
  • Remembered I never finished successfully running the cmake command I immediately ran it per Readme instructions and then the final 'make' command.
  • Results: Both the cloned miner and the executable miner seem to be running without any errors now (15+ minutes). I'm not sure why it works now but if you know or have suspicions I'm curious to hear them.

Edited to add proper code notation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants