-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to use GPU-accelerated hardware instead of CPU-bound with gpt4all #6
Comments
Update: GPT4All 2.5.2 with snoozy fails the "life well-lived" test. |
My quick attempt at trying to convert groovy from ggml to gguf using the llama.cpp utility did not work - it looks like this is a known impact of the swap to gguf, but I didn't have time today to investigate further. I did find the Nomic.ai model card for GPT4All-J to be helpful in explaining the specific iterations that led to groovy. New idea to test:
|
Hey @misslivirose , I got curious about making the project work on GPU, so I spend some my Sunday evening investigating the issue. I managed to make the whole thing to work with GPU by updating some of the dependencies and make some required changes. Feel free to look at my branch on https://github.com/tomjorquera/privateGPT/tree/gpu-acceleration and to pull as you please. It's not all rosy sadly, I've hit some snags along the road (some more of that below)
How to use with GPUMy changes introduce the I tested the changes with both LLamaCpp and GPT4All models both with and without GPU and it seems to work well on my side. Installing
|
Update: My fix for GPT4All streaming has been released with langchain v0.1.3. I updated my branch with latest version (0.1.4) and to make use of the relevant option. So now GPT4All streaming works again 🙂 |
Memory Cache should use a GPU that is available to do inference in order to speed up performance of queries and deriving insights from documents.
What I tried so far
I spent a few days last week exploring the differences between the primordial privateGPT version and latest. One of the major differences is that the newer project updates include support for GPU inference for llama and gpt4all, but the challenge that I ran into with the newer version is that moving from the older groovy.ggml model (which is no longer supported given that privateGPT now uses the .gguf format) to llama doesn't have the same results when ingesting the same local file store and querying.
This might be a matter of how RAG is implemented, something about how I set things up on my local machine, or a function of model choice.
I've lazily tried to see if this can be resolved through dependency changes but I haven't had luck getting to a version that runs that supports .ggml and GPU acceleration together. From what I can tell, Nomic introduced a version of gpt4all that works on GPU in 2.4 (latest is 2.5+) but it's unclear if there's a way to get this working cleanly with minimal changes to how my fork of privateGPT uses langchain to import the gpt4all package. It is unclear to me if this works on Ubuntu or if it's only Vulkan APIs, I need to do some additional investigation.
I did get CUDA installed and verified that my GPU is properly detected and set up to run the sample projects provided by Nvidia.
What's next
Testing
I've been using a highly subjective test to evaluate:
Prompt: "What is the meaning of a life well-lived?"
The answer for primordial privateGPT+groovy that has been augmented on my local files answers this question with a combination of "technology and community" consistently. No other combination of model/project has replicated that consistently.
The text was updated successfully, but these errors were encountered: