-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add readingorder model #44
Conversation
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Are you rewriting the C++ code from ds_glm_model for ordering as references here; DS4SD/docling#361 (reply in thread) |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Nikos Livathinos <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
… python 3.13 Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
fix pyproject pining
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
print("true: ", str(true_elem), ", rand: ", str(rand_elem)) | ||
""" | ||
|
||
pred_elements = romodel.predict_reading_order(page_elements=rand_elements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, the output of the predict_reading_order
is a permutation of the page_elements
it accepts as an input. This argument is a list of objects following a custom PageElement
data model, combined from BoundingBox and DocItem attributes.
To make sure there is a consistent interface for DoclingDocument
, the test unit should demonstrate how one needs to apply this resorted page_elements list to the DoclingDocument for all the supported operations:
- Change the order of
DocItem
instances in the body of aDoclingDocument
- Change multiple separate
DocItem
instances to a singleDocItem
instance when they must be merged (i.e. two or more provenances in one item) - Change a
DocItem
to have a caption or footnote etc. referencing to it.
(All of this obviously also needs API extensions on DoclingDocument)
Ultimately, the higher-level reading-order model must have an API that accepts a DoclingDocument and returns a new DoclingDocument with correct order. That does not necessarily need to happen in docling-ibm-models but it should be as easy as possible outside in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, the higher-level reading-order model must have an API that accepts a DoclingDocument and returns a new DoclingDocument with correct order
High-level I agree with the statement, but how do we see happening the merging of items? E.g. the paragraphs split among columns or pages? They are very likely identified only after the pieces are in the correct reading-order, which means the DoclingDocument items are not yet completely defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cau-git I dont think you fully grasp how we design the reading-order model. It seperates out in different tasks, in order to not get confused.
- It orders a set of PageElements. This ensures that we always return exactly the same number of elements as we get.
- It generates links between PageElements (to_caption, to_footnote, merge) in order to collapse PageElements into DocItems
We could have a higher level method that takes a DoclingDocument and returns an ordered one, but we also need the low-level, so we can measure the accuracy of the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clearly understand why we need the low level methods. What I want to understand is if we want to start from a DoclingDocument representation (like this test assumes) or from an internal representation like what the layout model produces. The current state would allow for both.
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🚀
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
ffa4adf
Signed-off-by: Christoph Auer <[email protected]>
Checklist:
conventional commits.