Issue: docling-serve GPU Memory Leak #1079

sir3mat · 2025-02-28T14:56:17Z

Question: docling-serve GPU Memory Leak

Description: docling-serve exhibits steadily increasing GPU memory usage over time when processing a consistent stream of documents, suggesting a memory leak. This leads to potential OOM errors.

Expected: GPU memory should plateau after model loading and remain relatively stable.

Actual: GPU memory usage continuously increases, as seen via nvidia-smi.

Steps to Reproduce:

Set up docling-serve with GPU backend (e.g., pdf_backend: dlparse_v2) and OCR enabled (do_ocr: true).
Send repeated requests to /v1alpha/convert/source or /v1alpha/convert/file with PDF documents. Example curl commands are provided in the original, longer issue description.
Monitor GPU memory usage with nvidia-smi.

How to avoid this for GPU limited resource environments?
I got an h100 with mig of 12GB VRAM but after 4 pdf is starts to throw OOM

docling serve request params
I use docling-serve with this params
params = {
"from_formats": [
"docx",
"pptx",
"html",
"image",
"pdf",
"asciidoc",
"md",
"csv",
"xlsx",
"xml_uspto",
"xml_jats",
"json_docling",
],
"to_formats": ["md"],
"image_export_mode": "placeholder",
"do_ocr": True,
"force_ocr": False,
"ocr_engine": "easyocr",
"ocr_lang": None,
"pdf_backend": "dlparse_v2",
"table_mode": "accurate",
"abort_on_error": False,
"return_as_file": False,
"do_table_structure": True,
"include_images": True,
"images_scale": 2.0,
}

The text was updated successfully, but these errors were encountered:

vku-ibm · 2025-03-06T12:53:40Z

We run tests internally. The infrastructure is an Openshift cluster.
Pod reservation specs:

single A10 GPU
cpu 9
ram 24Gb

The settings that were used:
"options": { "from_formats": [ "docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx", "xml_uspto", "xml_jats", "json_docling" ], "to_formats": ["md"], "image_export_mode": "placeholder", "do_ocr": true, "force_ocr": false, "ocr_engine": "easyocr", "ocr_lang": [], "pdf_backend": "dlparse_v2", "table_mode": "accurate", "abort_on_error": false, "return_as_file": false, "do_table_structure": true, "include_images": true, "images_scale": 2 },

Tried to convert several single documents:
https://arxiv.org/pdf/2206.01062
https://arxiv.org/pdf/2501.17887
https://arxiv.org/pdf/2411.19710

And then all three of them as a single payload.

The observed behavior is following using nvidia-smi:

memory mark goes up at first request
if the same document is submitted again and again, the mark stays of the same level
when new documents submitted, depending on the document, the mark can go up or stay as it was before
when all three documents were submitted the mark stayed at 3229MiB
the mark doesn't change if system is idling

I'm not sure if memory mark in nvidia-smi is reporting actual consumption or what was "booked" or just "touched", specifically in case of using docling-serve.

The payload was rather similar to each other and potentially didn't trigger the issue.
@sir3mat could you share with us documents on which you observed this issue, of course as long as these documents are publicly available and accessible. Otherwise perhaps a description of the content could help, like are they heavy on images, tables, perhaps they need full OCR and etc.

vku-ibm · 2025-03-06T15:24:05Z

Additional findings. There is definite trend of moving the GPU memory mark higher, when more documents are submitted. Most likely caused by one document somehow cause more memory being reserved, compared to other documents.

Additionally, currently when you are changing conversion options, against the same instance, the instance will cache settings object and there is no limit to the size of that cache. I've tried about 20 different permutations of the conversion options with the same document and while GPU memory mark does go up with some options more than others, it doesn't seam to "accumulate" or depend on the size of the cache. What does get effected is the RAM consumption. If there would be more options that can cause more permutation in models that are loaded on GPU, we potentially could see a larger impact.

So far, the highest mark I've got is 12691MiB, with following settings and payload:

`
"do_ocr": true,
"force_ocr": true,
"ocr_engine": "easyocr",
"ocr_lang": [],
"pdf_backend": "dlparse_v2",
"table_mode": "accurate",
"abort_on_error": false,
"return_as_file": false,
"do_table_structure": true,
"include_images": true,
"images_scale": 2

"http_sources": [
{"url": "https://arxiv.org/pdf/2206.01062"},
{"url": "https://arxiv.org/pdf/2501.17887"},
{"url": "https://arxiv.org/pdf/2411.19710"},
{"url": "https://arxiv.org/pdf/2409.18164"},
{"url": "https://arxiv.org/pdf/2408.09869"},
{"url": "https://arxiv.org/pdf/2406.19102"},
{"url": "https://arxiv.org/pdf/2405.10725"},
{"url": "https://arxiv.org/pdf/2305.14962"},
{"url": "https://arxiv.org/pdf/2209.03648"},
{"url": "https://arxiv.org/pdf/2206.00785"}]
`

archasek · 2025-03-06T15:27:09Z

single A10 GPU

cpu 9

ram 24Gb

By the way, may I ask how much time it takes to process a pdf with such resources?

dolfim-ibm · 2025-03-06T15:31:27Z

single A10 GPU

cpu 9

ram 24Gb

Note that these are just the reservation values. The actual usage is 1.5GB-2GB of RAM memory.

sir3mat added the question Further information is requested label Feb 28, 2025

sir3mat mentioned this issue Feb 28, 2025

[Question] How to run on CPU instead of GPU? #998

Closed

dolfim-ibm assigned vku-ibm Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: docling-serve GPU Memory Leak #1079

Issue: docling-serve GPU Memory Leak #1079

sir3mat commented Feb 28, 2025 •

edited

Loading

vku-ibm commented Mar 6, 2025

vku-ibm commented Mar 6, 2025 •

edited

Loading

archasek commented Mar 6, 2025

dolfim-ibm commented Mar 6, 2025

Issue: docling-serve GPU Memory Leak #1079

Issue: docling-serve GPU Memory Leak #1079

Comments

sir3mat commented Feb 28, 2025 • edited Loading

Question: docling-serve GPU Memory Leak

vku-ibm commented Mar 6, 2025

vku-ibm commented Mar 6, 2025 • edited Loading

archasek commented Mar 6, 2025

dolfim-ibm commented Mar 6, 2025

sir3mat commented Feb 28, 2025 •

edited

Loading

vku-ibm commented Mar 6, 2025 •

edited

Loading