Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of local vlm folder support #1051

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Navanit-git
Copy link

@Navanit-git Navanit-git commented Feb 25, 2025

Hi, This PR is used to let the user use the local repository or the model downloaded in their path.

I have ran this in my local.

from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True
pipeline_options.picture_description_options = PictureDescriptionVlmOptions(
     repo_id= "/opt/nav/Qwen/Qwen2.5-VL-7B-Instruct",  # <-- add here the Hugging Face repo_id of your favorite VLM
    prompt="Extract the text from the images, if it is table extract the table format.If there are no text give 'No Image Text' response",
)
pipeline_options.images_scale = 2.0
pipeline_options.generate_picture_images = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options,
        )
    }
)

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Hi, This PR is used to let the user use the local repository or the model downloaded in their path. 

I have ran this in my local.
```
from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True
pipeline_options.picture_description_options = PictureDescriptionVlmOptions(
     repo_id= "/opt/dlami/nvme/Qwen/Qwen2.5-VL-7B-Instruct",  # <-- add here the Hugging Face repo_id of your favorite VLM
    prompt="Extract the text from the images, if it is table extract the table format.If there are no text give 'No Image Text' response",
)
pipeline_options.images_scale = 2.0
pipeline_options.generate_picture_images = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options,
        )
    }
)
```

Signed-off-by: Navanit Dubey <[email protected]>
Copy link

mergify bot commented Feb 25, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce conventional commit

This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@Navanit-git Navanit-git changed the title Feature to use local vlm model too Addition of local vlm folder support Feb 25, 2025
Signed-off-by: Navanit Dubey <[email protected]>
@dolfim-ibm dolfim-ibm self-requested a review February 25, 2025 09:31
@Navanit-git
Copy link
Author

@dolfim-ibm any review?

@dolfim-ibm dolfim-ibm mentioned this pull request Feb 25, 2025
3 tasks
@dolfim-ibm
Copy link
Contributor

@dolfim-ibm any review?

Could you please see if #1057 would be enough for your use case?

We just started the approach of using artifacts_path consistently across models. In short: if it is defined, it will load all models locally.

@Navanit-git
Copy link
Author

@dolfim-ibm any review?

Could you please see if #1057 would be enough for your use case?

We just started the approach of using artifacts_path consistently across models. In short: if it is defined, it will load all models locally.

I don't think so this will resolve the error. The local file will again be downloaded.

@Navanit-git
Copy link
Author

Navanit-git commented Feb 25, 2025

@dolfim-ibm The #1057 doesn't solve the artifact path, since now it wants to download the easyocr and other models in that path. So, artifact path will make the path default.

But this PR will make the path as a dynamic for every ocr Models using repo id .

cc @cau-git

@cau-git
Copy link
Contributor

cau-git commented Feb 26, 2025

@Navanit-git From what I can see, your proposed change would not effect too much:

  1. If you use HF datasets snapshot_download, it will only re-transfer the actual assets if the content found in the cache-dir or the local_dir (if provided) is not current.
  2. Your change is applied only for the PictureDescriptionVlmModel, and no other model.

We established the CLI model downloader, and an analogous model download API to make it easy to pre-download models. However, if you want to work with pre-downloaded models and provide an artifacts_path to the converter, it will no longer even check on HuggingFace for new weights. This is the intended behaviour, which works well also for container environments where you might have non-standard model artifacts location (e.g. in writeable directory) or no networking at runtime.

Could you please explain what functionality you miss, given this update?

@Navanit-git
Copy link
Author

@Navanit-git From what I can see, your proposed change would not effect too much:

  1. If you use HF datasets snapshot_download, it will only re-transfer the actual assets if the content found in the cache-dir or the local_dir (if provided) is not current.
  2. Your change is applied only for the PictureDescriptionVlmModel, and no other model.

We established the CLI model downloader, and an analogous model download API to make it easy to pre-download models. However, if you want to work with pre-downloaded models and provide an artifacts_path to the converter, it will no longer even check on HuggingFace for new weights. This is the intended behaviour, which works well also for container environments where you might have non-standard model artifacts location (e.g. in writeable directory) or no networking at runtime.

Could you please explain what functionality you miss, given this update?

Hey, thank you for the review. So basically I have a PDF in which I want to do an OCR using any vlm models. So I followed the below link steps.
https://ds4sd.github.io/docling/examples/pictures_description/

I have downloaded the vlm models earlier in my local path which is not in a cache folder. So when I give a repo_id of that vlm model it gives an error hf error since it starts downloading. So to support this I have added one line that if the repo id path exist already then don't download give the path.
And regarding updating the weights, if there is any update in any models whenever the model gets loaded using transformer it will automatically update the weights or any change in model.

Yes, I know this is a minor changes since I have a project to deliver where I have to ocr the pdf file with images to md file/text file and I thought this simple change can help. If its redundant, kindly close this pr.

Also one small tiny request, export to markdown format when can we expect the image description, since I was working on it , and instead of image placeholder we can get the image description and I think I am very close to getting that, by changing in your library docling_core but there is a time crunch for me to work on my project.

@Navanit-git
Copy link
Author

@dolfim-ibm @cau-git ...

@dolfim-ibm
Copy link
Contributor

export to markdown format when can we expect the image description

this is coming very soon.

@dolfim-ibm
Copy link
Contributor

Regarding the overall PR, my opinion is in a few bullets.

  1. If you download a model from HF, you could simply moving it to folder with all the other artifacts. The fact the default is called cache, it is just naming; it is simply a folder with all the artifacts.
  2. Using the repo_id as path might get into unknown issues. Users might have a folder folder matching some random HF model and then the load is not working.
  3. The idea of allowing a local path is good. Maybe we a more explicit argument for it which is not reusing repo_id.

@Navanit-git
Copy link
Author

Regarding the overall PR, my opinion is in a few bullets.

  1. If you download a model from HF, you could simply moving it to folder with all the other artifacts. The fact the default is called cache, it is just naming; it is simply a folder with all the artifacts.
  2. Using the repo_id as path might get into unknown issues. Users might have a folder folder matching some random HF model and then the load is not working.
  3. The idea of allowing a local path is good. Maybe we a more explicit argument for it which is not reusing repo_id.

thank you @dolfim-ibm for now in my local I have done the above pr changes in docling library and its working fine. But we never know if we will find any error in the future see if we can add patches for it or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants