Skip to content

Commit

Permalink
Add documentation for ChromadbRM in DSPy retrievals documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
nerlfield committed Feb 16, 2024
1 parent 7dab4bb commit a9ef3e1
Showing 1 changed file with 64 additions and 1 deletion.
65 changes: 64 additions & 1 deletion docs/retrieval_models_client.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ This documentation provides an overview of the DSPy Retrieval Model Clients.
| --- | --- |
| ColBERTv2 | [ColBERTv2 Section](#ColBERTv2) |
| AzureCognitiveSearch | [AzureCognitiveSearch Section](#AzureCognitiveSearch) |
| ChromadbRM | [ChromadbRM Section](#ChromadbRM) |

## ColBERTv2

Expand Down Expand Up @@ -91,4 +92,66 @@ class AzureCognitiveSearch:

Refer to [ColBERTv2](#ColBERTv2) documentation. Keep in mind there is no `simplify` flag for AzureCognitiveSearch.

AzureCognitiveSearch supports sending queries and processing the received results, mapping content and scores to a correct format for the Azure Cognitive Search server.
AzureCognitiveSearch supports sending queries and processing the received results, mapping content and scores to a correct format for the Azure Cognitive Search server.

## ChromadbRM

### Quickstart with OpenAI Embeddings

ChromadbRM have the flexibility from a variety of embedding functions as outlined in the [chromadb embeddings documentation](https://docs.trychroma.com/embeddings). While different options are available, this example demonstrates how to utilize OpenAI embeddings specifically.

```python
from dspy.retrieve import ChromadbRM
import os
import openai
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

embedding_function = OpenAIEmbeddingFunction(
api_key=os.environ.get('OPENAI_API_KEY'),
model_name="text-embedding-ada-002"
)

retriever_model = ChromadbRM(
'your_collection_name',
'/path/to/your/db',
embedding_function=embedding_function,
k=5
)

results = retriever_model("Explore the significance of quantum computing", k=5)

for result in results:
print("Document:", result.long_text, "\n")
```

### Constructor

Initialize an instance of the `ChromadbRM` class, with the option to use OpenAI's embeddings or any alternative supported by chromadb, as detailed in the official [chromadb embeddings documentation](https://docs.trychroma.com/embeddings).

```python
ChromadbRM(
collection_name: str,
persist_directory: str,
embedding_function: Optional[EmbeddingFunction[Embeddable]] = OpenAIEmbeddingFunction(),
k: int = 7,
)
```

**Parameters:**
- `collection_name` (_str_): The name of the chromadb collection.
- `persist_directory` (_str_): Path to the directory where chromadb data is persisted.
- `embedding_function` (_Optional[EmbeddingFunction[Embeddable]]_, _optional_): The function used for embedding documents and queries. Defaults to `DefaultEmbeddingFunction()` if not specified.
- `k` (_int_, _optional_): The number of top passages to retrieve. Defaults to 7.

### Methods

#### `forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None) -> dspy.Prediction`

Search the chromadb collection for the top `k` passages matching the given query or queries, using embeddings generated via the specified `embedding_function`.

**Parameters:**
- `query_or_queries` (_Union[str, List[str]]_): The query or list of queries to search for.
- `k` (_Optional[int]_, _optional_): The number of results to retrieve. If not specified, defaults to the value set during initialization.

**Returns:**
- `dspy.Prediction`: Contains the retrieved passages, each represented as a `dotdict` with a `long_text` attribute.

0 comments on commit a9ef3e1

Please sign in to comment.