-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let embedding model run on GPU #71
Comments
If implemented, let users choose backend and hardware (CPU vs GPU / GPU1 or GPU2 or GPU3 ...) choose in preferences. |
Currently in In order to implement this we have these choices:
It's a very good idea, we should look into it, but probably a bit later, when we finally release AI chat and, maybe, add summarization. I'll mark the issue as low-priority, but it's only low priority for this context: week 1 and first release |
Actually, no, I'll remove |
I collect it at the final "anything else" Milestone "final polishing" 😅 |
GPU support (for embedding models) with llama.cpp: |
GPU support with Deep Java library: https://docs.djl.ai/engines/onnxruntime/onnxruntime-engine/index.html#install-gpu-package. Unfortunately they also use Microsofts ONNX, which seems to be very slow. I assume models need to be compatible with ONNX too, because not many models are uploaded on Huggingface in ONNX file format! |
One solution to providing support for GPU acceleration for LLMs (NOT necessarily embedding models!) is to provide proper support for OpenAI API. See issue JabRef#11872. Using external applications like llama.cpp, GPT4All, LMStudio, Ollama, Jan, KobolCPP etc. that already provide support for GPU acceleration, there is no need to add and maintain this feature in JabRef. It would still be nice to have GPU acceleration for embedding models though. Maybe do it like Koboldcpp and only provide a Vulkan backend, which is much much smaller than a Cuda backend (~1.5 GB in pytorch; 200 - 500 MB in llama.cpp). |
Historical "what the fuck" is available at JabRef#11430 (comment)
Advantages:
Disadvantages:
The text was updated successfully, but these errors were encountered: