Replies: 2 comments 5 replies
-
Hi @msarro, I'm sorry to hear that. We are aware that REST API implementation is not the best. For that reason, we have a roadmap item: Shape Requirements for REST API where we collect requirements to improve the implementation. In the meantime, to have a better guidance around the current implementation, I updated the documentation according to your feedback. Feel free to check and let me know if something is missing: REST API. If you need more info about Haystack API, you can check also this tutorial: Using Haystack with REST API. This tutorial uses Docker but explains pipeline yaml structure in details. Regarding to your question, you don't need to change anything in a python file. What we miss in the documentation is that you needed to install [Toggle] Pipeline YAML Example with PromptNode & PromptTemplateversion: '1.19.0'
components:
- name: DocumentStore
type: ElasticsearchDocumentStore
- name: EmbeddingRetriever # Selects the most relevant documents from the document store
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
model_format: sentence_transformers
top_k: 2 # The number of results to return
- name: qa_template
type: PromptTemplate
params:
output_parser:
type: AnswerParser
prompt: "Given the context please answer the question. Context: {join(documents)}; \
Question: {query}; \
Answer:
"
- name: PromptNode
type: PromptNode
params:
default_prompt_template: qa_template
max_length: 50 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based qa
model_name_or_path: google/flan-t5-base
- name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
type: FileTypeClassifier
- name: TextConverter # Converts files into documents
type: TextConverter
- name: PDFConverter # Converts PDFs into documents
type: PDFToTextConverter
- name: Preprocessor # Splits documents into smaller ones and cleans them up
type: PreProcessor
params:
# With a vector-based retriever, it's good to split your documents into smaller ones
split_by: word # The unit by which you want to split the documents
split_length: 250 # The max number of words in a document
split_overlap: 20 # Enables the sliding window approach
language: en
split_respect_sentence_boundary: True # Retains complete sentences in split documents
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: EmbeddingRetriever
inputs: [Query]
- name: PromptNode
inputs: [EmbeddingRetriever]
- name: indexing
nodes:
# Depending on the file type, we use a Text or PDF converter
- name: FileTypeClassifier
inputs: [File]
- name: TextConverter
inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
- name: PDFConverter
inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
- name: Preprocessor
inputs: [TextConverter, PDFConverter]
- name: EmbeddingRetriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [EmbeddingRetriever] Hope this answer helps 🙌 If you'd like to share more information about your docker setting, your pipeline yaml file and the full error you get, I'm happy to help you solving the error with Docker |
Beta Was this translation helpful? Give feedback.
-
Hi @msarro, I've also run into some issues with the API before. Here's a few things you can try that have helped me. First if you can get just the elasticsearch docker running (
If you cannot get those working, here's a very basic and stripped down implementation to get you started with setting up your own API... Project Structure
I would also usually throw in a routers folder that defines all the routes but for this example I will just define the endpoints in the main application file. Technically the only two files required here are the first two. Nothing is stopping you from defining everything in one huge file but that would just be madness... application.py
config.py
models.py
askQuestion.py
The server can then be started from the the parent directory that contains the demo_api folder in it with: Hope this helps! If you run into any issues or need help I'd be happy to assist in anyway I can. |
Beta Was this translation helpful? Give feedback.
-
Context:
I am building an ingest/QA api with haystack using DPR, which works. I am attempting to add in promptnode/prompttemplate. However, when running docker, everything seems to load, until the very end when I get an error that essentially says, "something bad happened, check the stack trace", except no stack trace is generated, or if it is, it isn't included in the docker output. So I am trying to re-implement as pure python to run locally without using docker/yml.
The problem is that the documentation's total explanation for how to do this on the rest api page is:
The problem is, that doesn't work. I get a ton of errors. There isn't any real example of what libraries need to be imported, how to structure it, what vars/functions need to be defined, etc.
The closest I can find is code here:
https://github.com/deepset-ai/haystack/blob/main/rest_api/rest_api/application.py
which provides a few breadcrumbs, but still doesn't really work.
There is no sample code showing how the source should be structured to be compatible with instantiation this way. Google seems to point me towards a bunch of examples using flask or fastapi.
Considering how awesome most of haystack's code is, this is surprisingly devoid of detail.
Are there any better example implementations for how you would structure a python file to be instantiated this way for running as a rest api without docker?
Beta Was this translation helpful? Give feedback.
All reactions