Qwen2-Colpali-OCR

This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.

It is deployed here on HuggingFace Spaces https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr

Prerequisites

Python 3.8+
Pytorch 2.4.1
Torchvision 0.19.1
Qwen V1
Byaldi
CUDA-compatible GPU (recommended for optimal performance)

Installation

Clone the repository:

git clone https://github.com/Claytonn7/qwen2-colpali-ocr.git
cd multimodal-rag-app

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Running the Application Locally

Ensure you're in the project directory and your virtual environment is activated.
Run the Streamlit app:
```
streamlit run app.py
```
Open a web browser and navigate to the URL provided by Streamlit (usually http://localhost:8501).

Usage

Choose to upload an image or use the example image.
If uploading, select an image file (PNG, JPG, or JPEG).
Enter a single keyword in the provided input field.
Adjust the maximum number of tokens for the response using the slider.
View the extracted text from the image, with the searched keyword highlighted. Example screenshot here

NB: Check the examples-app directory on this repo for more example screenshots.

Disclaimer

The app utilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in very slow processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.

Acknowledgments

This project uses the Qwen2-VL model from Hugging Face.
The byaldi implementation of the colpali model.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the GPL-2.0 License

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples-app		examples-app
LICENSE		LICENSE
README.md		README.md
app-logs-ss.png		app-logs-ss.png
app.py		app.py
hindi-qp.jpg		hindi-qp.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen2-Colpali-OCR

Prerequisites

Installation

Running the Application Locally

Usage

Disclaimer

Acknowledgments

Contributing

License

About

Releases

Packages

Languages

License

Claytonn7/qwen2-colpali-ocr

Folders and files

Latest commit

History

Repository files navigation

Qwen2-Colpali-OCR

Prerequisites

Installation

Running the Application Locally

Usage

Disclaimer

Acknowledgments

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages