cuBLAS error 15 #853

WorksButNotTested · 2023-11-21T09:25:52Z

Since upgrading from v0.4.0 to v0.5.0, I seem to be getting an error running tabby when enabling cuda. Here is the output I am seeing. It seems the error occurs when I first send a request to the endpoint using the swagger default template from the web UI.

Describe the bug
... INFO ... crates tabby/src/serve/mods.rs:146: Starting server, this might takes a few minutes...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: GGML_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
Device 0: Tesla M60, compute capability 5.2
Device 1: Tesla M60, compute capability 5.2
ggml_cuda_set_main_device: using device 0 (Telsa M60) as main device

... INFO ... crates tabby/src/serve/mods.rs:165: Listening at 0.0.0.0:8080

cuBLAS error 15 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:7282

Information about your version
v0.5.5

Information about your GPU
NVIDIA-SMI 470.223.03 Driver Version: 470.223.02, CUDA Version 11.4
Ubuntu 20.04.06 TLS

wsxiaoys · 2023-11-21T13:22:15Z

Could you try setting environment variable LLAMA_CPP_PARALLELISM=1 (which should reduce vram usage).

Related discussion: https://snowshoe.dev/tabbyml/ag5vLJl1ln9

WorksButNotTested · 2023-11-22T17:15:54Z

I added -e LLAMA_CPP_PARALLELISM=1 to my docker command, but I still get the same error?

wsxiaoys · 2023-11-23T04:17:49Z

https://github.com/TabbyML/llama.cpp/blob/75fb6f2ba0930be1515757196a81d32a1c2ab8ff/ggml-cuda.cu#L7289

Maybe it's related to compute capacity 5.2 doesn't supports fp16 operations.

(In 0.4 -> 0.5 transition, we switched the default implementation of cuda runtime to llama.cpp, which has slightly narrower support matrix)

Related: https://stackoverflow.com/questions/74995164/atomicadd-half-precision-floating-point-fp16-on-cuda-compute-capability-5-2

WorksButNotTested · 2023-11-23T08:32:56Z

Is it possible to configure tabby to revert back to the previous cuda runtime?
What did the previous runtime use in place of fp16 operations? Is it possible to change the parameter to cublasGemmBatchedEx? Or is there any possibility of the workaround mentioned in the article working?

WorksButNotTested added the bug-unconfirmed label Nov 21, 2023

wsxiaoys closed this as not planned Won't fix, can't repro, duplicate, stale Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuBLAS error 15 #853

cuBLAS error 15 #853

WorksButNotTested commented Nov 21, 2023

wsxiaoys commented Nov 21, 2023

WorksButNotTested commented Nov 22, 2023

wsxiaoys commented Nov 23, 2023 •

edited

Loading

WorksButNotTested commented Nov 23, 2023

cuBLAS error 15 #853

cuBLAS error 15 #853

Comments

WorksButNotTested commented Nov 21, 2023

wsxiaoys commented Nov 21, 2023

WorksButNotTested commented Nov 22, 2023

wsxiaoys commented Nov 23, 2023 • edited Loading

WorksButNotTested commented Nov 23, 2023

wsxiaoys commented Nov 23, 2023 •

edited

Loading