Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datenmodellierung #3

Open
blrtvs opened this issue Nov 26, 2018 · 2 comments
Open

Datenmodellierung #3

blrtvs opened this issue Nov 26, 2018 · 2 comments
Assignees

Comments

@blrtvs
Copy link
Collaborator

blrtvs commented Nov 26, 2018

No description provided.

@maxnth
Copy link
Collaborator

maxnth commented Jan 20, 2019

From my understanding the proposed data model only works with the PAGE XML files where text regions have a maximum of 4 coordinates which isn't always the case (e.g. when using polygons for page segmentation in tools like Aletheia or LAREX).

I attached an example file with more than 4 coordinates which was produced by OCR4ALL.
0006.zip

@maxnth maxnth self-assigned this Jan 21, 2019
@maxnth
Copy link
Collaborator

maxnth commented Jan 21, 2019

We should also keep in mind that a page can have more than one "TextRegion" element and that the Reading Order can play an important part in analyzing the extracted text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants