Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add readingorder model #44

Merged
merged 32 commits into from
Feb 20, 2025
Merged

feat: Add readingorder model #44

merged 32 commits into from
Feb 20, 2025

Conversation

PeterStaar-IBM
Copy link
Contributor

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@mllife
Copy link

mllife commented Nov 26, 2024

Are you rewriting the C++ code from ds_glm_model for ordering as references here; DS4SD/docling#361 (reply in thread)

Copy link

mergify bot commented Dec 13, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

@PeterStaar-IBM PeterStaar-IBM changed the title added ReadingOrder model feat: Add readingorder model Feb 5, 2025
dolfim-ibm and others added 4 commits February 5, 2025 16:45
@PeterStaar-IBM PeterStaar-IBM marked this pull request as ready for review February 6, 2025 13:09
print("true: ", str(true_elem), ", rand: ", str(rand_elem))
"""

pred_elements = romodel.predict_reading_order(page_elements=rand_elements)
Copy link
Contributor

@cau-git cau-git Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the output of the predict_reading_order is a permutation of the page_elements it accepts as an input. This argument is a list of objects following a custom PageElement data model, combined from BoundingBox and DocItem attributes.
To make sure there is a consistent interface for DoclingDocument, the test unit should demonstrate how one needs to apply this resorted page_elements list to the DoclingDocument for all the supported operations:

  • Change the order of DocItem instances in the body of a DoclingDocument
  • Change multiple separate DocItem instances to a single DocItem instance when they must be merged (i.e. two or more provenances in one item)
  • Change a DocItem to have a caption or footnote etc. referencing to it.

(All of this obviously also needs API extensions on DoclingDocument)

Ultimately, the higher-level reading-order model must have an API that accepts a DoclingDocument and returns a new DoclingDocument with correct order. That does not necessarily need to happen in docling-ibm-models but it should be as easy as possible outside in case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, the higher-level reading-order model must have an API that accepts a DoclingDocument and returns a new DoclingDocument with correct order

High-level I agree with the statement, but how do we see happening the merging of items? E.g. the paragraphs split among columns or pages? They are very likely identified only after the pieces are in the correct reading-order, which means the DoclingDocument items are not yet completely defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cau-git I dont think you fully grasp how we design the reading-order model. It seperates out in different tasks, in order to not get confused.

  1. It orders a set of PageElements. This ensures that we always return exactly the same number of elements as we get.
  2. It generates links between PageElements (to_caption, to_footnote, merge) in order to collapse PageElements into DocItems

We could have a higher level method that takes a DoclingDocument and returns an ordered one, but we also need the low-level, so we can measure the accuracy of the methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clearly understand why we need the low level methods. What I want to understand is if we want to start from a DoclingDocument representation (like this test assumes) or from an internal representation like what the layout model produces. The current state would allow for both.

PeterStaar-IBM and others added 2 commits February 7, 2025 17:29
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
@cau-git cau-git requested a review from dolfim-ibm February 19, 2025 14:40
dolfim-ibm
dolfim-ibm previously approved these changes Feb 19, 2025
Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🚀

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
cau-git
cau-git previously approved these changes Feb 19, 2025
Saidgurbuz
Saidgurbuz previously approved these changes Feb 19, 2025
dolfim-ibm
dolfim-ibm previously approved these changes Feb 19, 2025
Signed-off-by: Christoph Auer <[email protected]>
@cau-git cau-git dismissed stale reviews from nikos-livathinos, dolfim-ibm, Saidgurbuz, and themself via ffa4adf February 19, 2025 16:26
Signed-off-by: Christoph Auer <[email protected]>
@cau-git cau-git merged commit 23c1696 into main Feb 20, 2025
8 checks passed
@cau-git cau-git deleted the dev/add-reading-order branch February 20, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants