FastRAG is a simple Retrieval-Augmented Generation (RAG) application optimized for fast performance on general-grade PCs. It provides a chatbot interface that leverages vector-based search and large language models (LLMs) for answering questions and interacting with document-based data.
To get started with FastRAG locally, follow these steps:
-
Clone the repository:
git clone https://github.com/bibekyess/FastRAG.git
-
Navigate to the project directory:
cd FastRAG
-
Build and launch the containers:
docker compose up --build
This will start the FastRAG API and demo with all necessary services.
The FastRAG application launches several API endpoints for different purposes:
-
Get Conversation History
- Method:
GET
- Endpoint:
/conversation-history
- Parameters:
collection_name
(str): Name of the collection to fetch history from.limit
(int): Number of history entries to return. Default is 10.
- Method:
-
Add to Conversation History
- Method:
POST
- Endpoint:
/conversation-history
- Body:
collection_name
(str): Name of the collection to fetch history from.query
(str): User input queryresponse_text
(str): AI response
- Method:
-
Parse Document
- Method:
POST
- Endpoint:
/parse
- Parameters:
file
(UploadFile): The document to be parsed.index_id
(str): Index name for the document. Default isfiles
.splitting_type
(Literal['raw', 'md']): Splitting type for the document. Default israw
(based on chunk settings).
- Method:
-
Chat with the Bot
- Method:
POST
- Endpoint:
/chat
- Body:
user_input
(str): The user's query.index_id
(str): The index to search. Default is"files"
.llm_text
(str): The LLM model to use. Default is"local"
.dense_top_k
(int): The number of top results to return from the vector search. Default is 5.upgrade_user_input
(bool): Flag to indicate whether to upgrade the user input from conversation history. Default isFalse
.stream
(bool): Flag to enable streaming of results. Default isTrue
.
- Method:
- Gradio UI: FastRAG features a simple Gradio-based user interface for interacting with the chatbot.
- Real-time Chat: Users can upload a document and ask questions in real-time, with previous conversations stored and utilized for context-based improvements. [Providing the option to upload document is in progress]
- QdrantDB: The vector embeddings and chatbot conversation history are stored in QdrantDB. This allows the chatbot to utilize previous conversation context for improved responses.
- UI Display: Latency of the chatbot's response is displayed in the Gradio interface.
- Logging: Detailed logs of latency and other events are saved for debugging and performance monitoring.
FastRAG offers multiple options for segmenting documents into chunks:
- Raw Format: This option allows experimenting with various chunk sizes, strides, and overlapping settings for raw text parsing.
- Markdown Format: This method segments the document based on semantic information, creating more context-aware chunks.