-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuBLAS error 15 #853
Comments
Could you try setting environment variable Related discussion: https://snowshoe.dev/tabbyml/ag5vLJl1ln9 |
I added |
Maybe it's related to compute capacity 5.2 doesn't supports fp16 operations. (In 0.4 -> 0.5 transition, we switched the default implementation of cuda runtime to llama.cpp, which has slightly narrower support matrix) |
Is it possible to configure tabby to revert back to the previous cuda runtime? |
Since upgrading from v0.4.0 to v0.5.0, I seem to be getting an error running tabby when enabling cuda. Here is the output I am seeing. It seems the error occurs when I first send a request to the endpoint using the swagger default template from the web UI.
Describe the bug
... INFO ... crates tabby/src/serve/mods.rs:146: Starting server, this might takes a few minutes...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: GGML_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
Device 0: Tesla M60, compute capability 5.2
Device 1: Tesla M60, compute capability 5.2
ggml_cuda_set_main_device: using device 0 (Telsa M60) as main device
... INFO ... crates tabby/src/serve/mods.rs:165: Listening at 0.0.0.0:8080
cuBLAS error 15 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:7282
Information about your version
v0.5.5
Information about your GPU
NVIDIA-SMI 470.223.03 Driver Version: 470.223.02, CUDA Version 11.4
Ubuntu 20.04.06 TLS
The text was updated successfully, but these errors were encountered: