A powerful document chatbot that combines vLLM, FastAPI, and Streamlit to provide intelligent responses based on uploaded documents. The system supports both OpenAI GPT
and Llama 3
models, featuring real-time streaming responses and conversation memory.
- 📑 Document Processing: Upload and process PDF documents
- 💬 Intelligent Chat: Context-aware responses using vLLM
- 🔍 Document Search: Semantic search using FAISS
- 🚀 High Performance: Tensor parallelism support with vLLM
- 🌐 Modern Interface: Streamlit-based UI with real-time responses
- 📚 Multi-document Support: Chat with multiple uploaded documents
- 🔄 Context Retention: Maintains conversation context (Working on it)
- 📈 Source Citations: Provides references for responses
- 📝 Document Upload & Processing
- 💬 Interactive Chat Interface
- 🔍 Context-Aware Responses
- 📚 Source Citations
- ⚡ Real-time Processing
- NVIDIA Driver
- CUDA Toolkit >= 12.1
- Python 3.10+
- 16GB+ RAM
- Ubuntu 22.04 or later
- Backend: FastAPI
- Frontend: Streamlit
- Models: OpenAI GPT, Llama 3
- GPU Parallelism: vLLM
- Vector Store: FAISS
- Document Processing: PyPDF2, LangChain
- Containerization: Docker
document-chatbot/
├── app/
│ ├── api/
│ │ └── routes.py # FastAPI routes
│ └── core/
│ ├── config.py # Configuration settings
│ ├── document_processor.py # PDF processing
│ ├── exceptions.py # Custom exceptions
│ ├── llm_manager.py # LLM integration
│ └── memory_manager.py # Conversation memory
├── frontend/
│ ├── app.py # Streamlit application
│ └── components/
│ ├── chat_interface.py # Streamlit chat interface
│ ├── document_uploader.py # File upload handling
│ └── model_selector.py # Model selector
├── scripts/
│ └── start_services.py # Service orchestration
├── data/
│ ├── uploads/ # Document storage
│ └── vector_store/ # FAISS indexes
├── Dockerfile
├── docker-compose.yml
├── vllm_env.yaml # Conda environment
└── README.md
1. Install NVIDIA Driver and CUDA Toolkit:
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
# Install CUDA
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-1
2. Install Miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
3. Clone the repository:
git clone https://github.com/yourusername/document-chatbot.git
cd document-chatbot
4. Create and activate conda environment:
conda env create -f vllm_env.yaml
conda activate vllm_env
5. Create necessary directories:
mkdir -p data/uploads data/vector_store
6. Start the services:
python scripts/start_services.py
1. Install Docker:
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add your user to docker group
sudo usermod -aG docker $USER
newgrp docker
2. Install NVIDIA Container Toolkit:
# Add NVIDIA package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install NVIDIA Docker support
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
3. Clone and build:
git clone https://github.com/yourusername/document-chatbot.git
cd document-chatbot
# Build and start services
docker-compose up --build
4. Stop services:
docker-compose down
- Streamlit UI: http://localhost:8501
- FastAPI Docs: http://localhost:8080/docs
- vLLM API: http://localhost:8000/v1
- Use the sidebar uploader
- Support for PDF files
- Wait for processing confirmation
- Type questions in chat
- View source citations
- Clear chat history as needed
MIT License - see LICENSE file for details
- vLLM Team for the inference engine
- Hugging Face for model hosting
- FastAPI and Streamlit teams
GitHub Issues: Project Issues Email: [email protected]