-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement chunking #2136
Comments
What do you want to do with chunking? Right now, chunking is by sentence, so each document should generate N chunks that it should be interoperable. If no enough chunks are being retrieving, you should increase top |
I have already updated all parameters so that privateGPT quotes at least 10 sources. Currently, I have implemeted I am still getting 4 sources. I want to maximise my sources. How do I achieve this? |
I have already updated all parameters so that privateGPT quotes at least 10 sources. Currently, I have implemeted
Chunking: sentence chunking.
Llm: llama 3.1 8b
Embedding:nomic embed text
Vector storage : qdrant
Context window : 32000
I am still getting 4 sources. I want to maximise my sources. How do I achieve this?
I have tried multiple things, asked in discord community, but no-one has helped
Yahoo Mail: Search, Organize, Conquer
On Tue, Jan 7, 2025 at 13:59, Javier ***@***.***> wrote:
What do you want to do with chunking? Right now, chunking is by sentence, so each document should generate N chunks that it should be interoperable. If no enough chunks are being retrieving, you should increase top similarity_top_k in ChatService.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
LLM or context window won't change anything in search part. Can you change or comment similarity value to check it? You can change in settings.yaml. If it doesn't work, you should change you embedding model to another one, with more spatial capacities |
The similarity value is already disabled. |
this could be a possible way:
3.Embed Smaller Granular Chunks: 4.Enhance Embedding Model: 5.Combine Embeddings Across Documents: |
Question
I want to ingest 150-200 files of 15-20 pages and want to query them and want answers to be generated from multiple files. Presently it is just quoting 2 sources. Is chunking the way out? How to implement it and code for the same please
The text was updated successfully, but these errors were encountered: