-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue: docling-serve GPU Memory Leak #1079
Comments
We run tests internally. The infrastructure is an Openshift cluster.
The settings that were used: Tried to convert several single documents: And then all three of them as a single payload. The observed behavior is following using
I'm not sure if memory mark in nvidia-smi is reporting actual consumption or what was "booked" or just "touched", specifically in case of using docling-serve. The payload was rather similar to each other and potentially didn't trigger the issue. |
Additional findings. There is definite trend of moving the GPU memory mark higher, when more documents are submitted. Most likely caused by one document somehow cause more memory being reserved, compared to other documents. Additionally, currently when you are changing conversion options, against the same instance, the instance will cache settings object and there is no limit to the size of that cache. I've tried about 20 different permutations of the conversion options with the same document and while GPU memory mark does go up with some options more than others, it doesn't seam to "accumulate" or depend on the size of the cache. What does get effected is the RAM consumption. If there would be more options that can cause more permutation in models that are loaded on GPU, we potentially could see a larger impact. So far, the highest mark I've got is 12691MiB, with following settings and payload: ` "http_sources": [ |
By the way, may I ask how much time it takes to process a pdf with such resources? |
Note that these are just the reservation values. The actual usage is 1.5GB-2GB of RAM memory. |
Question: docling-serve GPU Memory Leak
Description: docling-serve exhibits steadily increasing GPU memory usage over time when processing a consistent stream of documents, suggesting a memory leak. This leads to potential OOM errors.
Expected: GPU memory should plateau after model loading and remain relatively stable.
Actual: GPU memory usage continuously increases, as seen via nvidia-smi.
Steps to Reproduce:
How to avoid this for GPU limited resource environments?
I got an h100 with mig of 12GB VRAM but after 4 pdf is starts to throw OOM
docling serve request params
I use docling-serve with this params
params = {
"from_formats": [
"docx",
"pptx",
"html",
"image",
"pdf",
"asciidoc",
"md",
"csv",
"xlsx",
"xml_uspto",
"xml_jats",
"json_docling",
],
"to_formats": ["md"],
"image_export_mode": "placeholder",
"do_ocr": True,
"force_ocr": False,
"ocr_engine": "easyocr",
"ocr_lang": None,
"pdf_backend": "dlparse_v2",
"table_mode": "accurate",
"abort_on_error": False,
"return_as_file": False,
"do_table_structure": True,
"include_images": True,
"images_scale": 2.0,
}
The text was updated successfully, but these errors were encountered: