Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of GPU support #133

Open
ViktorooReps opened this issue Oct 8, 2024 · 4 comments
Open

State of GPU support #133

ViktorooReps opened this issue Oct 8, 2024 · 4 comments

Comments

@ViktorooReps
Copy link

Hello Deep Search Team!

Thank you for this contribution to open source!

We are considering using your library to parse PDF files for LLM training, so we will potentially need to scale things up. Do you have any updates on GPU/multi-GPU support? Maybe some directions on where to start if we were to work on GPU support ourselves?

@dolfim-ibm
Copy link
Contributor

Hi @ViktorooReps, thanks for reaching out.

We are planning some performance improvement in the next days/week. If you are willing to contribute, it will for sure be appreciated.

Performance will be addressed in three ways

  1. Faster PDF backend. Here we have a WIP branch in feat: new experimental docling-parse v2 backend #131
  2. (Re-)Enable multi-threaded for page batches. This was getting into segfault by some components which are not thread-safe.
  3. Make efficient use of GPUs for the models.

The initial thought about 3 are

  • Make sure the models can use the cuda as torch device
  • In the processing code, we have to make better usage of batches across pages. For example, we analyze all the tables on a page together, but we could also accumulate (e.g. 16) tables across pages and make an efficient inference on the GPU for those.

@leviataniac
Copy link

We have ran multiple time a RAG pipeline with included examples here with Milvus ... even with scaling on NVIDIA GPU L4 machines and it worked very well. Was a bit challenging to compile the docker image for that, but it seems to perform better Not really did a performance metrics, but from the observations is at least 2x faster. Looking forward to v2 implementation, thank you guys for that great job.

FYI is the start of the Dockerfile for getting the things run in the docker image. Ensure, that drivers are proper activated to docker with gpu capabilities, that GPU is really used:

##########
FROM nvidia/cuda:12.6.1-runtime-ubuntu24.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 python3-venv python3-dev python3-pip cron git curl && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app

RUN python3 -m venv /opt/venv

# installing pytorch with GPU support (CUDA 12.1) 
RUN /opt/venv/bin/pip install --upgrade pip && \
    /opt/venv/bin/pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

.......
###########

@dolfim-ibm
Copy link
Contributor

@leviataniac thanks for sharing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@leviataniac @ViktorooReps @dolfim-ibm and others