Docling error on memory allocation #1126

pbonito · 2025-03-06T06:37:55Z

Bug

We are parsing multiple documents in parallel using dataflow. With CPU we didn't observe any problem, since we switched to GPU we got the following error:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 252.00 MiB. GPU 0 has a total capacity of 14.58 GiB of which 39.62 MiB is free. Process 165 has 0 bytes memory in use. Process 157 has 0 bytes memory in use. Process 168 has 0 bytes memory in use. Process 170 has 0 bytes memory in use. Of the allocated memory 2.44 GiB is allocated by PyTorch, and 368.84 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

This is our accelerator configuration:
accelerator_options = AcceleratorOptions(
num_threads=4, device=AcceleratorDevice.AUTO
)

I understand that we can improve memory allocation, but should docling wait until is able to allocate memory rather than fail?

Steps to reproduce

Launch multiple parsing on GPU with
accelerator_options = AcceleratorOptions(
num_threads=4, device=AcceleratorDevice.AUTO
)

Docling version

Docling version: 2.16.0
Docling Core version: 2.17.2
Docling IBM Models version: 3.3.1
Docling Parse version: 3.3.0

Python version

Python 3.11.5

PeterStaar-IBM · 2025-03-07T07:55:57Z

@pbonito You are running a rather old version of Docling, would it be possible to upgrade and run the test again?

Matteo-Omenetti · 2025-03-07T09:14:07Z

@pbonito Could you please also share the PipelineOptions you are using? So that we can understand which models you have enabled.

Thank you

pbonito · 2025-03-07T13:25:04Z

accelerator_options = AcceleratorOptions(
    num_threads=4, device=AcceleratorDevice.AUTO
)
artifacts_path = os.environ["DOCLING_DIR"]
easy_ocr_path = f"{os.environ['EASY_OCR_DIR']}/model"
pipeline_options = PipelineOptions(pipeline_args)

parser_pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)

parser_pipeline_options.do_ocr = True
parser_pipeline_options.accelerator_options = accelerator_options
parser_pipeline_options.do_table_structure = True
parser_pipeline_options.table_structure_options.do_cell_matching = False

parser_pipeline_options.ocr_options.download_enabled = False
parser_pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
parser_pipeline_options.ocr_options.model_storage_directory = easy_ocr_path

vku-ibm · 2025-03-07T13:37:52Z

@pbonito What I've seen so far is that individual documents can push GPU memory "consumption" up. The conversion settings do have effect on that, but main impact comes from the content of the document.
I'm planning to run some tests to see if the amount of pages, images, tables is being a main contributor here.

In regards to "limiting" of how much memory can be requested. I'm not sure how exactly this can be handled as this is a common case with the torch itself.

pbonito · 2025-03-08T08:44:33Z

@PeterStaar-IBM Same error with 2.24.0

pbonito added the bug Something isn't working label Mar 6, 2025

PeterStaar-IBM assigned vku-ibm Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docling error on memory allocation #1126

Docling error on memory allocation #1126

pbonito commented Mar 6, 2025 •

edited

Loading

PeterStaar-IBM commented Mar 7, 2025

Matteo-Omenetti commented Mar 7, 2025

pbonito commented Mar 7, 2025

vku-ibm commented Mar 7, 2025

pbonito commented Mar 8, 2025

Docling error on memory allocation #1126

Docling error on memory allocation #1126

Comments

pbonito commented Mar 6, 2025 • edited Loading

Bug

Steps to reproduce

Docling version

Python version

PeterStaar-IBM commented Mar 7, 2025

Matteo-Omenetti commented Mar 7, 2025

pbonito commented Mar 7, 2025

vku-ibm commented Mar 7, 2025

pbonito commented Mar 8, 2025

pbonito commented Mar 6, 2025 •

edited

Loading