Addition of local vlm folder support #1051

Navanit-git · 2025-02-25T09:18:21Z

Hi, This PR is used to let the user use the local repository or the model downloaded in their path.

I have ran this in my local.

from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True
pipeline_options.picture_description_options = PictureDescriptionVlmOptions(
     repo_id= "/opt/nav/Qwen/Qwen2.5-VL-7B-Instruct",  # <-- add here the Hugging Face repo_id of your favorite VLM
    prompt="Extract the text from the images, if it is table extract the table format.If there are no text give 'No Image Text' response",
)
pipeline_options.images_scale = 2.0
pipeline_options.generate_picture_images = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options,
        )
    }
)

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

Hi, This PR is used to let the user use the local repository or the model downloaded in their path. I have ran this in my local. ``` from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions pipeline_options = PdfPipelineOptions() pipeline_options.do_picture_description = True pipeline_options.picture_description_options = PictureDescriptionVlmOptions( repo_id= "/opt/dlami/nvme/Qwen/Qwen2.5-VL-7B-Instruct", # <-- add here the Hugging Face repo_id of your favorite VLM prompt="Extract the text from the images, if it is table extract the table format.If there are no text give 'No Image Text' response", ) pipeline_options.images_scale = 2.0 pipeline_options.generate_picture_images = True converter = DocumentConverter( format_options={ InputFormat.PDF: PdfFormatOption( pipeline_options=pipeline_options, ) } ) ``` Signed-off-by: Navanit Dubey <[email protected]>

mergify · 2025-02-25T09:18:56Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce conventional commit

This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Signed-off-by: Navanit Dubey <[email protected]>

Navanit-git · 2025-02-25T13:06:30Z

@dolfim-ibm any review?

dolfim-ibm · 2025-02-25T13:44:02Z

@dolfim-ibm any review?

Could you please see if #1057 would be enough for your use case?

We just started the approach of using artifacts_path consistently across models. In short: if it is defined, it will load all models locally.

Navanit-git · 2025-02-25T13:56:04Z

@dolfim-ibm any review?

Could you please see if #1057 would be enough for your use case?

We just started the approach of using artifacts_path consistently across models. In short: if it is defined, it will load all models locally.

I don't think so this will resolve the error. The local file will again be downloaded.

Navanit-git · 2025-02-25T15:23:49Z

@dolfim-ibm The #1057 doesn't solve the artifact path, since now it wants to download the easyocr and other models in that path. So, artifact path will make the path default.

But this PR will make the path as a dynamic for every ocr Models using repo id .

cc @cau-git

cau-git · 2025-02-26T13:39:48Z

@Navanit-git From what I can see, your proposed change would not effect too much:

If you use HF datasets snapshot_download, it will only re-transfer the actual assets if the content found in the cache-dir or the local_dir (if provided) is not current.
Your change is applied only for the PictureDescriptionVlmModel, and no other model.

We established the CLI model downloader, and an analogous model download API to make it easy to pre-download models. However, if you want to work with pre-downloaded models and provide an artifacts_path to the converter, it will no longer even check on HuggingFace for new weights. This is the intended behaviour, which works well also for container environments where you might have non-standard model artifacts location (e.g. in writeable directory) or no networking at runtime.

Could you please explain what functionality you miss, given this update?

Navanit-git · 2025-02-26T14:21:13Z

@Navanit-git From what I can see, your proposed change would not effect too much:

If you use HF datasets snapshot_download, it will only re-transfer the actual assets if the content found in the cache-dir or the local_dir (if provided) is not current.

Your change is applied only for the PictureDescriptionVlmModel, and no other model.

We established the CLI model downloader, and an analogous model download API to make it easy to pre-download models. However, if you want to work with pre-downloaded models and provide an artifacts_path to the converter, it will no longer even check on HuggingFace for new weights. This is the intended behaviour, which works well also for container environments where you might have non-standard model artifacts location (e.g. in writeable directory) or no networking at runtime.

Could you please explain what functionality you miss, given this update?

Hey, thank you for the review. So basically I have a PDF in which I want to do an OCR using any vlm models. So I followed the below link steps.
https://ds4sd.github.io/docling/examples/pictures_description/

I have downloaded the vlm models earlier in my local path which is not in a cache folder. So when I give a repo_id of that vlm model it gives an error hf error since it starts downloading. So to support this I have added one line that if the repo id path exist already then don't download give the path.
And regarding updating the weights, if there is any update in any models whenever the model gets loaded using transformer it will automatically update the weights or any change in model.

Yes, I know this is a minor changes since I have a project to deliver where I have to ocr the pdf file with images to md file/text file and I thought this simple change can help. If its redundant, kindly close this pr.

Also one small tiny request, export to markdown format when can we expect the image description, since I was working on it , and instead of image placeholder we can get the image description and I think I am very close to getting that, by changing in your library docling_core but there is a time crunch for me to work on my project.

Navanit-git · 2025-03-06T09:50:09Z

@dolfim-ibm @cau-git ...

dolfim-ibm · 2025-03-07T13:52:27Z

export to markdown format when can we expect the image description

this is coming very soon.

dolfim-ibm · 2025-03-07T14:25:43Z

Regarding the overall PR, my opinion is in a few bullets.

If you download a model from HF, you could simply moving it to folder with all the other artifacts. The fact the default is called cache, it is just naming; it is simply a folder with all the artifacts.
Using the repo_id as path might get into unknown issues. Users might have a folder folder matching some random HF model and then the load is not working.
The idea of allowing a local path is good. Maybe we a more explicit argument for it which is not reusing repo_id.

Navanit-git · 2025-03-07T14:33:47Z

Regarding the overall PR, my opinion is in a few bullets.

If you download a model from HF, you could simply moving it to folder with all the other artifacts. The fact the default is called cache, it is just naming; it is simply a folder with all the artifacts.

Using the repo_id as path might get into unknown issues. Users might have a folder folder matching some random HF model and then the load is not working.

The idea of allowing a local path is good. Maybe we a more explicit argument for it which is not reusing repo_id.

thank you @dolfim-ibm for now in my local I have done the above pr changes in docling library and its working fine. But we never know if we will find any error in the future see if we can add patches for it or something.

Navanit-git changed the title ~~Feature to use local vlm model too~~ Addition of local vlm folder support Feb 25, 2025

small changes

45b8cb7

Signed-off-by: Navanit Dubey <[email protected]>

dolfim-ibm self-requested a review February 25, 2025 09:31

dolfim-ibm mentioned this pull request Feb 25, 2025

fix: vlm using artifacts path #1057

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of local vlm folder support #1051

Addition of local vlm folder support #1051

Navanit-git commented Feb 25, 2025 •

edited

Loading

mergify bot commented Feb 25, 2025

Navanit-git commented Feb 25, 2025

dolfim-ibm commented Feb 25, 2025

Navanit-git commented Feb 25, 2025

Navanit-git commented Feb 25, 2025 •

edited

Loading

cau-git commented Feb 26, 2025 •

edited

Loading

Navanit-git commented Feb 26, 2025

Navanit-git commented Mar 6, 2025

dolfim-ibm commented Mar 7, 2025

dolfim-ibm commented Mar 7, 2025

Navanit-git commented Mar 7, 2025

Addition of local vlm folder support #1051

Are you sure you want to change the base?

Addition of local vlm folder support #1051

Conversation

Navanit-git commented Feb 25, 2025 • edited Loading

mergify bot commented Feb 25, 2025

Merge Protections

🔴 Enforce conventional commit

Navanit-git commented Feb 25, 2025

dolfim-ibm commented Feb 25, 2025

Navanit-git commented Feb 25, 2025

Navanit-git commented Feb 25, 2025 • edited Loading

cau-git commented Feb 26, 2025 • edited Loading

Navanit-git commented Feb 26, 2025

Navanit-git commented Mar 6, 2025

dolfim-ibm commented Mar 7, 2025

dolfim-ibm commented Mar 7, 2025

Navanit-git commented Mar 7, 2025

Navanit-git commented Feb 25, 2025 •

edited

Loading

Navanit-git commented Feb 25, 2025 •

edited

Loading

cau-git commented Feb 26, 2025 •

edited

Loading