Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected" #253

Open
EricLBuehler opened this issue Jun 12, 2024 · 9 comments
Open

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected" #253

EricLBuehler opened this issue Jun 12, 2024 · 9 comments

Comments

@EricLBuehler
Copy link

Hello all,

Thanks for your great work here! When I run using cudarc, I get the error:

called `Result::unwrap()` on an `Err` value: Cuda(Cuda(DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")))

Here is my system information:

$ nvidia-smi
Tue Jun 11 23:53:28 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.72                 Driver Version: 536.45       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M2000M                  On  | 00000000:01:00.0 Off |                  N/A |
| N/A    0C    P8              N/A / 200W |      0MiB /  4096MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        33      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
5.0

$ echo $CUDA_VISIBLE_DEVICES
0

I would appreciate any help!

@coreylowman
Copy link
Owner

Is pytorch able to see the GPU? Also what cuda toolkit version is being targeted by cudarc (if using cuda-version-from-build-system, is it being compiled on this machine?)

@coreylowman
Copy link
Owner

@EricLBuehler any more information on this issue? Will close in a week if not

@EricLBuehler
Copy link
Author

@coreylowman sorry for not getting back! I am running this on my GPU and Pytorch can see it (torch.cuda.is_available() == True).

@coreylowman
Copy link
Owner

@EricLBuehler are there any differences with dynamic loading vs dynamic linking features for cudarc? Also curious about what toolkit version you are targeting in cudarc features

@EricLBuehler
Copy link
Author

I am using cuda-version-from-build-system and dynamic-linking. How should I try dynamic loading?

@coreylowman
Copy link
Owner

If you don't enable the dynamic-linking feature it will use dynamic loading.

🤔 Could you try targeting 12.2 (cuda-12020) instead of version from build system? Just curious if that would change anything.

@EricLBuehler
Copy link
Author

Hmm yeah, same error.
Current:

cudarc = { version = "0.11.5", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-12020"], default-features=false }

@coreylowman
Copy link
Owner

I got nothing off the top of my head. Do you get this error if you git clone cudarc and try to run the unit tests?

cargo test --tests --no-default-features -F std,cuda-12050,driver

Is this running inside a docker container?

If that doesn't work I'd probably try to go to c++ level and verify a simple example there that links to cuda finds gpu. If that doesn't work then that at least tells us that pytorch is doing something special that we need to copy.

@jianshu93
Copy link

Hi both, I also have as similar error:

DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed")
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted
[jzhao399@atl1-1-02-018-25-0 release]$ which nvidia-smi
/usr/bin/nvidia-smi
[jzhao399@atl1-1-02-018-25-0 release]$ nvidia-smi
Wed Jul 17 11:25:54 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:C1:00.0 Off | 0 |
| N/A 34C P0 43W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

via PyTorch, this can be solved but not sure how to solve here.

Thanks,

Jianshu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants