DPR create_embeddings/update_embeddings FAISS is so slow! #5641
Unanswered
shahad2099
asked this question in
Questions
Replies: 1 comment 4 replies
-
Hi @shahad2099 happy to answer your questions! 🙂
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I hope you're all doing well. I am using the Haystack framework to build a retriever. After training DPR, I wanted to use FAISS as my vector database. However, updating or creating embeddings is so slow! I have 3 million short documents (100 words/document), and for only 17% of them, it takes almost 5 hours and 38 minutes. This is incredibly frustrating for me.
What should I do? Are there any optimizations that I should implement in the code?
I also have some simple questions:
1-If I want to add more data, should I store all the documents and create embeddings from scratch? Or can I add and create embeddings for the new ones only?
2-In your opinion, what is the most efficient vector database for a retriever?
3-Does the update_embeddings function utilize GPUs or multi-threading?
Many thanks to all of you!
Beta Was this translation helpful? Give feedback.
All reactions