Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA-ooM with large PDFs #50

Open
infinitytrans opened this issue Feb 27, 2025 · 2 comments
Open

CUDA-ooM with large PDFs #50

infinitytrans opened this issue Feb 27, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@infinitytrans
Copy link

🐛 Describe the bug

Component: Pipeline (pipeline.py)
Version: Latest as of February 27, 2025 (assumed from git clone)
Environment:
OS: Ubuntu

Python: 3.12

GPU: NVIDIA RTX 3090 (24 GB VRAM)

CUDA: 12.4

SGLang: 0.4.2

PyTorch: Installed via olmocr dependencies

Description:
When processing large PDF files (e.g., tests/2.pdf with 4180 pages), the olmocr.pipeline module excessively allocates CUDA memory (VRAM) during the preparation phase, leading to a torch.OutOfMemoryError. This occurs even when parameters like --pages_per_group and --workers are set to limit the number of pages processed simultaneously. The issue appears to stem from the pipeline loading or rendering all pages into CUDA memory at once, rather than respecting the batch size defined by --pages_per_group.
Steps to Reproduce:
Set up the olmocr environment:

git clone https://github.com/allenai/olmocr.git
cd olmocr
pip install -e .
pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/

Start the SGLang server:

python -m sglang.launch_server --model-path allenai/olmOCR-7B-0225-preview --port 30024

Run the pipeline with a large PDF (e.g., 4180 pages):

python -m olmocr.pipeline ./localworkspace --pdfs tests/2.pdf --model allenai/olmOCR-7B-0225-preview --pages_per_group 10 --workers 1

Monitor VRAM usage:

watch -n 1 nvidia-smi

Actual Result:

The pipeline process allocates excessive CUDA memory (~3 GiB observed, growing with PDF size) before sending data to the server.
Logs show Got 4180 pages to do for tests/2.pdf in worker 0, indicating all pages are prepared at once, ignoring --pages_per_group 10. VRAM fills up (e.g., 20.45 GiB by server + 3.18 GiB by pipeline), triggering:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacity of 23.68 GiB of which 111.12 MiB is free.

Expected Result:
The pipeline should respect --pages_per_group 10, preparing and processing only 10 pages at a time in CUDA memory. VRAM usage by the pipeline should remain minimal (e.g., <1-2 GiB for 10 pages), allowing large PDFs (1000+ pages) to be processed without OOM errors.

Additional Information:
Pipeline VRAM usage spikes during PDF preparation, before server inference begins (server logs show no significant activity beyond initial requests).
Setting CUDA_VISIBLE_DEVICES="" prevents CUDA usage but crashes the pipeline with RuntimeError: No CUDA GPUs are available, indicating a hard dependency on CUDA.

Suggested Fix:
Modify pipeline.py to incrementally load and render PDF pages in batches defined by --pages_per_group, avoiding loading all pages into CUDA memory at once.
Optionally, allow CPU-only rendering as a fallback (remove or make optional the GPU check in check.py:38).

Logs:
2025-02-27 01:41:39,217 - main - INFO - Got 4180 pages to do for tests/2.pdf in worker 0
[...]

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacity of 23.68 GiB of which 111.12 MiB is free. Process 34417 has 20.45 GiB memory in use. Including non-PyTorch memory, this process has 3.18 GiB memory in use.

Versions

aiohappyeyeballs==2.4.6
aiohttp==3.11.13
aiosignal==1.3.2
annotated-types==0.7.0
anthropic==0.47.2
anyio==4.8.0
asttokens==3.0.0
attrs==25.1.0
beaker-py==1.34.1
bitsandbytes==0.45.3
bleach==6.2.0
boto3==1.37.2
botocore==1.37.2
cached_path==1.6.7
cachetools==5.5.2
certifi==2025.1.31
cffi==1.17.1
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
compressed-tensors==0.8.0
cryptography==44.0.1
cuda-bindings==12.8.0
cuda-python==12.8.0
datasets==3.3.2
decorator==5.2.1
decord==0.6.0
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
docker==7.1.0
einops==0.8.1
executing==2.2.0
fastapi==0.115.8
filelock==3.17.0
flashinfer==0.1.6+cu124torch2.4
frozenlist==1.5.0
fsspec==2024.12.0
ftfy==6.3.1
fuzzysearch==0.7.3
gguf==0.10.0
google-api-core==2.24.1
google-auth==2.38.0
google-cloud-core==2.4.2
google-cloud-storage==2.19.0
google-crc32c==1.6.0
google-resumable-media==2.7.2
googleapis-common-protos==1.68.0
h11==0.14.0
hf_transfer==0.1.9
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.6.1
interegular==0.3.3
ipython==8.32.0
jedi==0.19.2
Jinja2==3.1.5
jiter==0.8.2
jmespath==1.0.1
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
lingua-language-detector==2.0.2
litellm==1.61.17
llvmlite==0.44.0
lm-format-enforcer==0.10.11
markdown-it-py==3.0.0
markdown2==2.5.3
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mistral_common==1.5.3
modelscope==1.23.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
numba==0.61.0
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-ml-py==12.570.86
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
-e git+https://github.com/allenai/olmocr.git@bd08fdb4761538c96224ace9e951e5d956589790#egg=olmocr
openai==1.64.0
opencv-python-headless==4.11.0.86
orjson==3.10.15
outlines==0.0.46
packaging==24.2
pandas==2.2.3
parso==0.8.4
partial-json-parser==0.2.1.1.post5
pexpect==4.9.0
pillow==11.1.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
prompt_toolkit==3.0.50
propcache==0.3.0
proto-plus==1.26.0
protobuf==5.29.3
psutil==7.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==19.0.1
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.6
pydantic_core==2.27.2
Pygments==2.19.1
pypdf==5.3.0
pypdfium2==4.30.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.20
pytz==2025.1
PyYAML==6.0.2
pyzmq==26.2.1
RapidFuzz==3.12.1
ray==2.42.1
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.23.1
rsa==4.9
s3transfer==0.11.3
safetensors==0.5.3
sentencepiece==0.2.0
sequence_align==0.2.0
setproctitle==1.3.5
setuptools==75.8.2
sgl-kernel==0.0.3.post1
sglang==0.4.2
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
stack-data==0.6.3
starlette==0.45.3
sympy==1.13.1
tiktoken==0.9.0
tokenizers==0.21.0
torch==2.5.1
torchao==0.8.0
torchvision==0.20.1
tqdm==4.67.1
traitlets==5.14.3
transformers==4.49.0
triton==3.1.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
vllm==0.6.4.post1
watchfiles==1.0.4
wcwidth==0.2.13
webencodings==0.5.1
websockets==15.0
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.14
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0
zstandard==0.23.0

@infinitytrans infinitytrans added the bug Something isn't working label Feb 27, 2025
@jakep-allenai
Copy link
Collaborator

Hey, thanks for the report, a few follow up questions:

  • What GPU exactly are you using?
  • Can you link to your PDF if it is not private?

@jakep-allenai
Copy link
Collaborator

Nvm, I see it's a 3090, it's strange I have tested on 4090's please share your PDF if you can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants