Using Score of embedding retriever #6305

Peveld · 2023-11-14T17:00:15Z

Peveld
Nov 14, 2023

Hi, my idea for a RAG Project was to feed only valid documents into the question to the LLM. So I'm trying to find significant differences in the returned score. I followed the tutorial and have something like:

`from haystack.document_stores.faiss import FAISSDocumentStore
from haystack import Document

faissDocumentStore = FAISSDocumentStore(faiss_index_factory_str="Flat", sql_url="sqlite:////tmp/faiss_document_store.db")

documents = [Document(content="The english channel is 30 kilometers wide."), Document(content="la le li la di da")]
faissDocumentStore.write_documents(documents)

from haystack.nodes import EmbeddingRetriever
retriever = EmbeddingRetriever(
document_store=faissDocumentStore, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)

faissDocumentStore.update_embeddings(retriever)

myquery = "How wide is the english channel?"
docs = retriever.retrieve(query=myquery, top_k=2)

print(docs)`

However the difference in Score is pretty small 0.5734 vs 0.5290. On my real text base I find nearly same score values for perfectly matching docs and perfectly not matching ones. My fantasy was to provide a general threshold... Do I understand something wrong or is there maybe a better approach?

anakin87 · 2023-11-14T17:52:07Z

anakin87
Nov 14, 2023
Maintainer

Hello, @Peveld!

Two quick ideas:

try setting scale_score to False and see if it helps.
It's an init parameter of the Embedding Retriever.
If you still need to filter Documents using a threshold, you can use this node: Haystack Threshold Node.

1 reply

Peveld Nov 16, 2023
Author

Hello @anakin87 ! Thank you for your answer, experimented around with the scale_score set to false, but cannot find a general threshold what robust divides matching from not matching documents. The values are pretty near to each other. For now I'm working with a prompt to LLM "If you cannot find the answer in related text, answer with I dont't know.". But this way I cannot access knowledge of LLM. My idea was to let the answer be based on my documents, if I find significant ones and be based on LLM knowledge if not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Score of embedding retriever #6305

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Using Score of embedding retriever #6305

Peveld Nov 14, 2023

Replies: 1 comment · 1 reply

anakin87 Nov 14, 2023 Maintainer

Peveld Nov 16, 2023 Author

Peveld
Nov 14, 2023

Replies: 1 comment 1 reply

anakin87
Nov 14, 2023
Maintainer

Peveld Nov 16, 2023
Author