Skip to content

MostSimilarDocumentsPipeline for Retrieving Similar Sentences #3299

Discussion options

You must be logged in to vote

Hi,

The most similar document pipeline is just using the already calculated word embeddings of a document to find similar documents to that document vector.  YOu can achieve the same thing using the DocumentSearchPipeline and the EmbeddingRetriever.

You will simply pass the text from the sentence that you want to find all similar documents and it will create the embeddings that sentence and then just run the same query_by_embeddings method that the MSD uses.

Here is a simple outline of the code:

document_store = ElasticserachDocumentStore(similarity='cosine')
retriever = EmbeddingRetriever(document_store=document_store, embedding_model='sentence-transformers/all-mpnet-base-v2')
search_pi…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@sankalp-acl
Comment options

@mwade-noetic
Comment options

Answer selected by sankalp-acl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants