You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Component: Pipeline (pipeline.py)
Version: Latest as of February 27, 2025 (assumed from git clone)
Environment:
OS: Ubuntu
Python: 3.12
GPU: NVIDIA RTX 3090 (24 GB VRAM)
CUDA: 12.4
SGLang: 0.4.2
PyTorch: Installed via olmocr dependencies
Description:
When processing large PDF files (e.g., tests/2.pdf with 4180 pages), the olmocr.pipeline module excessively allocates CUDA memory (VRAM) during the preparation phase, leading to a torch.OutOfMemoryError. This occurs even when parameters like --pages_per_group and --workers are set to limit the number of pages processed simultaneously. The issue appears to stem from the pipeline loading or rendering all pages into CUDA memory at once, rather than respecting the batch size defined by --pages_per_group.
Steps to Reproduce:
Set up the olmocr environment:
The pipeline process allocates excessive CUDA memory (~3 GiB observed, growing with PDF size) before sending data to the server.
Logs show Got 4180 pages to do for tests/2.pdf in worker 0, indicating all pages are prepared at once, ignoring --pages_per_group 10. VRAM fills up (e.g., 20.45 GiB by server + 3.18 GiB by pipeline), triggering:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacity of 23.68 GiB of which 111.12 MiB is free.
Expected Result:
The pipeline should respect --pages_per_group 10, preparing and processing only 10 pages at a time in CUDA memory. VRAM usage by the pipeline should remain minimal (e.g., <1-2 GiB for 10 pages), allowing large PDFs (1000+ pages) to be processed without OOM errors.
Additional Information:
Pipeline VRAM usage spikes during PDF preparation, before server inference begins (server logs show no significant activity beyond initial requests).
Setting CUDA_VISIBLE_DEVICES="" prevents CUDA usage but crashes the pipeline with RuntimeError: No CUDA GPUs are available, indicating a hard dependency on CUDA.
Suggested Fix:
Modify pipeline.py to incrementally load and render PDF pages in batches defined by --pages_per_group, avoiding loading all pages into CUDA memory at once.
Optionally, allow CPU-only rendering as a fallback (remove or make optional the GPU check in check.py:38).
Logs:
2025-02-27 01:41:39,217 - main - INFO - Got 4180 pages to do for tests/2.pdf in worker 0
[...]
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacity of 23.68 GiB of which 111.12 MiB is free. Process 34417 has 20.45 GiB memory in use. Including non-PyTorch memory, this process has 3.18 GiB memory in use.
🐛 Describe the bug
Component: Pipeline (pipeline.py)
Version: Latest as of February 27, 2025 (assumed from git clone)
Environment:
OS: Ubuntu
Python: 3.12
GPU: NVIDIA RTX 3090 (24 GB VRAM)
CUDA: 12.4
SGLang: 0.4.2
PyTorch: Installed via olmocr dependencies
Description:
When processing large PDF files (e.g., tests/2.pdf with 4180 pages), the olmocr.pipeline module excessively allocates CUDA memory (VRAM) during the preparation phase, leading to a torch.OutOfMemoryError. This occurs even when parameters like --pages_per_group and --workers are set to limit the number of pages processed simultaneously. The issue appears to stem from the pipeline loading or rendering all pages into CUDA memory at once, rather than respecting the batch size defined by --pages_per_group.
Steps to Reproduce:
Set up the olmocr environment:
Start the SGLang server:
Run the pipeline with a large PDF (e.g., 4180 pages):
Monitor VRAM usage:
Actual Result:
The pipeline process allocates excessive CUDA memory (~3 GiB observed, growing with PDF size) before sending data to the server.
Logs show Got 4180 pages to do for tests/2.pdf in worker 0, indicating all pages are prepared at once, ignoring --pages_per_group 10. VRAM fills up (e.g., 20.45 GiB by server + 3.18 GiB by pipeline), triggering:
Expected Result:
The pipeline should respect --pages_per_group 10, preparing and processing only 10 pages at a time in CUDA memory. VRAM usage by the pipeline should remain minimal (e.g., <1-2 GiB for 10 pages), allowing large PDFs (1000+ pages) to be processed without OOM errors.
Additional Information:
Pipeline VRAM usage spikes during PDF preparation, before server inference begins (server logs show no significant activity beyond initial requests).
Setting CUDA_VISIBLE_DEVICES="" prevents CUDA usage but crashes the pipeline with RuntimeError: No CUDA GPUs are available, indicating a hard dependency on CUDA.
Suggested Fix:
Modify pipeline.py to incrementally load and render PDF pages in batches defined by --pages_per_group, avoiding loading all pages into CUDA memory at once.
Optionally, allow CPU-only rendering as a fallback (remove or make optional the GPU check in check.py:38).
Logs:
2025-02-27 01:41:39,217 - main - INFO - Got 4180 pages to do for tests/2.pdf in worker 0
[...]
Versions
aiohappyeyeballs==2.4.6
aiohttp==3.11.13
aiosignal==1.3.2
annotated-types==0.7.0
anthropic==0.47.2
anyio==4.8.0
asttokens==3.0.0
attrs==25.1.0
beaker-py==1.34.1
bitsandbytes==0.45.3
bleach==6.2.0
boto3==1.37.2
botocore==1.37.2
cached_path==1.6.7
cachetools==5.5.2
certifi==2025.1.31
cffi==1.17.1
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
compressed-tensors==0.8.0
cryptography==44.0.1
cuda-bindings==12.8.0
cuda-python==12.8.0
datasets==3.3.2
decorator==5.2.1
decord==0.6.0
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
docker==7.1.0
einops==0.8.1
executing==2.2.0
fastapi==0.115.8
filelock==3.17.0
flashinfer==0.1.6+cu124torch2.4
frozenlist==1.5.0
fsspec==2024.12.0
ftfy==6.3.1
fuzzysearch==0.7.3
gguf==0.10.0
google-api-core==2.24.1
google-auth==2.38.0
google-cloud-core==2.4.2
google-cloud-storage==2.19.0
google-crc32c==1.6.0
google-resumable-media==2.7.2
googleapis-common-protos==1.68.0
h11==0.14.0
hf_transfer==0.1.9
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.6.1
interegular==0.3.3
ipython==8.32.0
jedi==0.19.2
Jinja2==3.1.5
jiter==0.8.2
jmespath==1.0.1
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
lingua-language-detector==2.0.2
litellm==1.61.17
llvmlite==0.44.0
lm-format-enforcer==0.10.11
markdown-it-py==3.0.0
markdown2==2.5.3
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mistral_common==1.5.3
modelscope==1.23.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
numba==0.61.0
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-ml-py==12.570.86
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
-e git+https://github.com/allenai/olmocr.git@bd08fdb4761538c96224ace9e951e5d956589790#egg=olmocr
openai==1.64.0
opencv-python-headless==4.11.0.86
orjson==3.10.15
outlines==0.0.46
packaging==24.2
pandas==2.2.3
parso==0.8.4
partial-json-parser==0.2.1.1.post5
pexpect==4.9.0
pillow==11.1.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
prompt_toolkit==3.0.50
propcache==0.3.0
proto-plus==1.26.0
protobuf==5.29.3
psutil==7.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==19.0.1
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.6
pydantic_core==2.27.2
Pygments==2.19.1
pypdf==5.3.0
pypdfium2==4.30.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.20
pytz==2025.1
PyYAML==6.0.2
pyzmq==26.2.1
RapidFuzz==3.12.1
ray==2.42.1
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.23.1
rsa==4.9
s3transfer==0.11.3
safetensors==0.5.3
sentencepiece==0.2.0
sequence_align==0.2.0
setproctitle==1.3.5
setuptools==75.8.2
sgl-kernel==0.0.3.post1
sglang==0.4.2
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
stack-data==0.6.3
starlette==0.45.3
sympy==1.13.1
tiktoken==0.9.0
tokenizers==0.21.0
torch==2.5.1
torchao==0.8.0
torchvision==0.20.1
tqdm==4.67.1
traitlets==5.14.3
transformers==4.49.0
triton==3.1.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
vllm==0.6.4.post1
watchfiles==1.0.4
wcwidth==0.2.13
webencodings==0.5.1
websockets==15.0
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.14
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0
zstandard==0.23.0
The text was updated successfully, but these errors were encountered: