Let embedding model run on GPU #71

ThiloteE · 2024-07-05T23:40:54Z

Historical "what the fuck" is available at JabRef#11430 (comment)

Advantages:

For LLMs, GPU is much faster than CPU. 10 times + X faster, depending on hardware.

Disadvantages:

Please correct me, if I am wrong, but I expect dependencies for GPU backend are required. (E.g. llama.cpp, Nvidia (drivers, Cuda toolkit libraries), Vulkan, RoCm, SYCL, ...)

ThiloteE · 2024-07-05T23:54:56Z

If implemented, let users choose backend and hardware (CPU vs GPU / GPU1 or GPU2 or GPU3 ...) choose in preferences.

InAnYan · 2024-07-17T09:37:29Z

Currently in langchain4j in-process embedding models (meaning they run locally on a computer) are run only on CPU. There is an issue to run embedding models on GPU, but it's not resolved.

In order to implement this we have these choices:

Wait for implementation in langchain4j: simpler to develop, better from architectural point of view.
Write fix ourselves for langchain4j: good.
Use external modules and write all the support code ourselves in JabRef: the fastest way

It's a very good idea, we should look into it, but probably a bit later, when we finally release AI chat and, maybe, add summarization.

I'll mark the issue as low-priority, but it's only low priority for this context: week 1 and first release

InAnYan · 2024-07-17T09:44:25Z

Actually, no, I'll remove low-priority, and won't assign a milestone

koppor · 2024-08-02T14:46:26Z

I collect it at the final "anything else" Milestone "final polishing" 😅

ThiloteE · 2024-08-02T15:20:06Z

GPU support (for embedding models) with llama.cpp:

ThiloteE · 2024-08-07T20:23:32Z

GPU support with Deep Java library: https://docs.djl.ai/engines/onnxruntime/onnxruntime-engine/index.html#install-gpu-package. Unfortunately they also use Microsofts ONNX, which seems to be very slow. I assume models need to be compatible with ONNX too, because not many models are uploaded on Huggingface in ONNX file format!

koppor · 2024-08-29T10:39:58Z

At least, one can paint everything blue in the CPU utilization

ThiloteE · 2024-10-01T12:23:50Z

One solution to providing support for GPU acceleration for LLMs (NOT necessarily embedding models!) is to provide proper support for OpenAI API. See issue JabRef#11872. Using external applications like llama.cpp, GPT4All, LMStudio, Ollama, Jan, KobolCPP etc. that already provide support for GPU acceleration, there is no need to add and maintain this feature in JabRef. It would still be nice to have GPU acceleration for embedding models though. Maybe do it like Koboldcpp and only provide a Vulkan backend, which is much much smaller than a Cuda backend (~1.5 GB in pytorch; 200 - 500 MB in llama.cpp).

ThiloteE added type: enhancement New feature or request important-decision This issue contains an important architectural decision labels Jul 5, 2024

InAnYan added Priority: low This issue is not very important (right now) and removed Priority: low This issue is not very important (right now) labels Jul 17, 2024

ThiloteE added the status: freeze Issues posponed to a (much) later future label Jul 17, 2024

koppor added this to the Week 12 milestone Aug 2, 2024

ThiloteE mentioned this issue Oct 1, 2024

Support for server APIs of apps specialised in local LLMs JabRef/jabref#11872

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let embedding model run on GPU #71

Let embedding model run on GPU #71

ThiloteE commented Jul 5, 2024 •

edited

Loading

ThiloteE commented Jul 5, 2024

InAnYan commented Jul 17, 2024

InAnYan commented Jul 17, 2024

koppor commented Aug 2, 2024

ThiloteE commented Aug 2, 2024

ThiloteE commented Aug 7, 2024 •

edited

Loading

koppor commented Aug 29, 2024

ThiloteE commented Oct 1, 2024 •

edited

Loading

Let embedding model run on GPU #71

Let embedding model run on GPU #71

Comments

ThiloteE commented Jul 5, 2024 • edited Loading

Advantages:

Disadvantages:

ThiloteE commented Jul 5, 2024

InAnYan commented Jul 17, 2024

InAnYan commented Jul 17, 2024

koppor commented Aug 2, 2024

ThiloteE commented Aug 2, 2024

ThiloteE commented Aug 7, 2024 • edited Loading

koppor commented Aug 29, 2024

ThiloteE commented Oct 1, 2024 • edited Loading

ThiloteE commented Jul 5, 2024 •

edited

Loading

ThiloteE commented Aug 7, 2024 •

edited

Loading

ThiloteE commented Oct 1, 2024 •

edited

Loading