Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: docling-serve GPU Memory Leak #1079

Open
sir3mat opened this issue Feb 28, 2025 · 4 comments
Open

Issue: docling-serve GPU Memory Leak #1079

sir3mat opened this issue Feb 28, 2025 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@sir3mat
Copy link

sir3mat commented Feb 28, 2025

Question: docling-serve GPU Memory Leak

Description: docling-serve exhibits steadily increasing GPU memory usage over time when processing a consistent stream of documents, suggesting a memory leak. This leads to potential OOM errors.

Expected: GPU memory should plateau after model loading and remain relatively stable.

Actual: GPU memory usage continuously increases, as seen via nvidia-smi.

Steps to Reproduce:

Set up docling-serve with GPU backend (e.g., pdf_backend: dlparse_v2) and OCR enabled (do_ocr: true).
Send repeated requests to /v1alpha/convert/source or /v1alpha/convert/file with PDF documents. Example curl commands are provided in the original, longer issue description.
Monitor GPU memory usage with nvidia-smi.

How to avoid this for GPU limited resource environments?
I got an h100 with mig of 12GB VRAM but after 4 pdf is starts to throw OOM

docling serve request params
I use docling-serve with this params
params = {
"from_formats": [
"docx",
"pptx",
"html",
"image",
"pdf",
"asciidoc",
"md",
"csv",
"xlsx",
"xml_uspto",
"xml_jats",
"json_docling",
],
"to_formats": ["md"],
"image_export_mode": "placeholder",
"do_ocr": True,
"force_ocr": False,
"ocr_engine": "easyocr",
"ocr_lang": None,
"pdf_backend": "dlparse_v2",
"table_mode": "accurate",
"abort_on_error": False,
"return_as_file": False,
"do_table_structure": True,
"include_images": True,
"images_scale": 2.0,
}

@sir3mat sir3mat added the question Further information is requested label Feb 28, 2025
@vku-ibm
Copy link

vku-ibm commented Mar 6, 2025

We run tests internally. The infrastructure is an Openshift cluster.
Pod reservation specs:

  • single A10 GPU
  • cpu 9
  • ram 24Gb

The settings that were used:
"options": { "from_formats": [ "docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx", "xml_uspto", "xml_jats", "json_docling" ], "to_formats": ["md"], "image_export_mode": "placeholder", "do_ocr": true, "force_ocr": false, "ocr_engine": "easyocr", "ocr_lang": [], "pdf_backend": "dlparse_v2", "table_mode": "accurate", "abort_on_error": false, "return_as_file": false, "do_table_structure": true, "include_images": true, "images_scale": 2 },

Tried to convert several single documents:
https://arxiv.org/pdf/2206.01062
https://arxiv.org/pdf/2501.17887
https://arxiv.org/pdf/2411.19710

And then all three of them as a single payload.

The observed behavior is following using nvidia-smi:

  • memory mark goes up at first request
  • if the same document is submitted again and again, the mark stays of the same level
  • when new documents submitted, depending on the document, the mark can go up or stay as it was before
  • when all three documents were submitted the mark stayed at 3229MiB
  • the mark doesn't change if system is idling

I'm not sure if memory mark in nvidia-smi is reporting actual consumption or what was "booked" or just "touched", specifically in case of using docling-serve.

The payload was rather similar to each other and potentially didn't trigger the issue.
@sir3mat could you share with us documents on which you observed this issue, of course as long as these documents are publicly available and accessible. Otherwise perhaps a description of the content could help, like are they heavy on images, tables, perhaps they need full OCR and etc.

@vku-ibm
Copy link

vku-ibm commented Mar 6, 2025

Additional findings. There is definite trend of moving the GPU memory mark higher, when more documents are submitted. Most likely caused by one document somehow cause more memory being reserved, compared to other documents.

Additionally, currently when you are changing conversion options, against the same instance, the instance will cache settings object and there is no limit to the size of that cache. I've tried about 20 different permutations of the conversion options with the same document and while GPU memory mark does go up with some options more than others, it doesn't seam to "accumulate" or depend on the size of the cache. What does get effected is the RAM consumption. If there would be more options that can cause more permutation in models that are loaded on GPU, we potentially could see a larger impact.

So far, the highest mark I've got is 12691MiB, with following settings and payload:

`
"do_ocr": true,
"force_ocr": true,
"ocr_engine": "easyocr",
"ocr_lang": [],
"pdf_backend": "dlparse_v2",
"table_mode": "accurate",
"abort_on_error": false,
"return_as_file": false,
"do_table_structure": true,
"include_images": true,
"images_scale": 2

"http_sources": [
{"url": "https://arxiv.org/pdf/2206.01062"},
{"url": "https://arxiv.org/pdf/2501.17887"},
{"url": "https://arxiv.org/pdf/2411.19710"},
{"url": "https://arxiv.org/pdf/2409.18164"},
{"url": "https://arxiv.org/pdf/2408.09869"},
{"url": "https://arxiv.org/pdf/2406.19102"},
{"url": "https://arxiv.org/pdf/2405.10725"},
{"url": "https://arxiv.org/pdf/2305.14962"},
{"url": "https://arxiv.org/pdf/2209.03648"},
{"url": "https://arxiv.org/pdf/2206.00785"}]
`

@archasek
Copy link

archasek commented Mar 6, 2025

  • single A10 GPU
  • cpu 9
  • ram 24Gb

By the way, may I ask how much time it takes to process a pdf with such resources?

@dolfim-ibm
Copy link
Contributor

  • single A10 GPU

  • cpu 9

  • ram 24Gb

Note that these are just the reservation values. The actual usage is 1.5GB-2GB of RAM memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants