CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected" #253

EricLBuehler · 2024-06-12T03:54:57Z

Hello all,

Thanks for your great work here! When I run using cudarc, I get the error:

called `Result::unwrap()` on an `Err` value: Cuda(Cuda(DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")))

Here is my system information:

$ nvidia-smi
Tue Jun 11 23:53:28 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.72                 Driver Version: 536.45       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M2000M                  On  | 00000000:01:00.0 Off |                  N/A |
| N/A    0C    P8              N/A / 200W |      0MiB /  4096MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        33      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
5.0

$ echo $CUDA_VISIBLE_DEVICES
0

I would appreciate any help!

The text was updated successfully, but these errors were encountered:

coreylowman · 2024-06-12T20:53:39Z

Is pytorch able to see the GPU? Also what cuda toolkit version is being targeted by cudarc (if using cuda-version-from-build-system, is it being compiled on this machine?)

coreylowman · 2024-07-16T19:33:32Z

@EricLBuehler any more information on this issue? Will close in a week if not

EricLBuehler · 2024-07-16T19:36:03Z

@coreylowman sorry for not getting back! I am running this on my GPU and Pytorch can see it (torch.cuda.is_available() == True).

coreylowman · 2024-07-16T19:40:02Z

@EricLBuehler are there any differences with dynamic loading vs dynamic linking features for cudarc? Also curious about what toolkit version you are targeting in cudarc features

EricLBuehler · 2024-07-16T19:43:50Z

I am using cuda-version-from-build-system and dynamic-linking. How should I try dynamic loading?

coreylowman · 2024-07-16T19:57:24Z

If you don't enable the dynamic-linking feature it will use dynamic loading.

🤔 Could you try targeting 12.2 (cuda-12020) instead of version from build system? Just curious if that would change anything.

EricLBuehler · 2024-07-16T19:59:35Z

Hmm yeah, same error.
Current:

cudarc = { version = "0.11.5", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-12020"], default-features=false }

coreylowman · 2024-07-16T20:50:05Z

I got nothing off the top of my head. Do you get this error if you git clone cudarc and try to run the unit tests?

cargo test --tests --no-default-features -F std,cuda-12050,driver

Is this running inside a docker container?

If that doesn't work I'd probably try to go to c++ level and verify a simple example there that links to cuda finds gpu. If that doesn't work then that at least tells us that pytorch is doing something special that we need to copy.

jianshu93 · 2024-07-17T15:27:34Z

Hi both, I also have as similar error:

DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed")
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted
[jzhao399@atl1-1-02-018-25-0 release]$ which nvidia-smi
/usr/bin/nvidia-smi
[jzhao399@atl1-1-02-018-25-0 release]$ nvidia-smi
Wed Jul 17 11:25:54 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:C1:00.0 Off | 0 |
| N/A 34C P0 43W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

via PyTorch, this can be solved but not sure how to solve here.

Thanks,

Jianshu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected" #253

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected" #253

EricLBuehler commented Jun 12, 2024

coreylowman commented Jun 12, 2024

coreylowman commented Jul 16, 2024

EricLBuehler commented Jul 16, 2024

coreylowman commented Jul 16, 2024

EricLBuehler commented Jul 16, 2024

coreylowman commented Jul 16, 2024

EricLBuehler commented Jul 16, 2024

coreylowman commented Jul 16, 2024

jianshu93 commented Jul 17, 2024

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected" #253

CUDA_ERROR_NO_DEVICE "no CUDA-capable device is detected" #253

Comments

EricLBuehler commented Jun 12, 2024

coreylowman commented Jun 12, 2024

coreylowman commented Jul 16, 2024

EricLBuehler commented Jul 16, 2024

coreylowman commented Jul 16, 2024

EricLBuehler commented Jul 16, 2024

coreylowman commented Jul 16, 2024

EricLBuehler commented Jul 16, 2024

coreylowman commented Jul 16, 2024

jianshu93 commented Jul 17, 2024