You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From my understanding the proposed data model only works with the PAGE XML files where text regions have a maximum of 4 coordinates which isn't always the case (e.g. when using polygons for page segmentation in tools like Aletheia or LAREX).
I attached an example file with more than 4 coordinates which was produced by OCR4ALL. 0006.zip
We should also keep in mind that a page can have more than one "TextRegion" element and that the Reading Order can play an important part in analyzing the extracted text.
No description provided.
The text was updated successfully, but these errors were encountered: