llama_cpp_python server > 0.2.79 breaks the vulkan image #742

lstocchi · 2024-08-08T09:17:59Z

When you build the vulkan image using llama_cpp_python 0.2.79 you see that it is actually able to detect and use gpu bc in the logs you can find

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Virtio-GPU Venus (Apple M2 Pro) (venus) | uma: 1 | fp16: 1 | warp size: 32
llm_load_tensors: ggml ctx size =  0.30 MiB
warning: failed to mlock 73732096-byte buffer (after previously locking 0 bytes): Cannot allocate memory
Try increasing RLIMIT_MEMLOCK ('ulimit -l' as root).
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:    CPU buffer size =  70.31 MiB
llm_load_tensors:  Vulkan0 buffer size = 4095.05 MiB
.................................................................................................

However starting from 0.2.80+ there is something broken and the gpu detection/usage is completely skipped. In the logs you just find

...
llm_load_tensors:    CPU buffer size = 4165.37 MiB
...

I also tested with the latest version 0.2.87 and it is still broken. Now we're using 0.2.85 -> https://github.com/containers/ai-lab-recipes/blob/main/model_servers/llamacpp_python/src/requirements.txt#L1

The text was updated successfully, but these errors were encountered:

lstocchi mentioned this issue Aug 8, 2024

fix: revert llama cpp python server to 0.2.79 to enable gpu containers/podman-desktop-extension-ai-lab-playground-images#44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_cpp_python server > 0.2.79 breaks the vulkan image #742

llama_cpp_python server > 0.2.79 breaks the vulkan image #742

lstocchi commented Aug 8, 2024

llama_cpp_python server > 0.2.79 breaks the vulkan image #742

llama_cpp_python server > 0.2.79 breaks the vulkan image #742

Comments

lstocchi commented Aug 8, 2024