Datenmodellierung #3

blrtvs · 2018-11-26T16:53:58Z

No description provided.

maxnth · 2019-01-20T20:27:20Z

From my understanding the proposed data model only works with the PAGE XML files where text regions have a maximum of 4 coordinates which isn't always the case (e.g. when using polygons for page segmentation in tools like Aletheia or LAREX).

I attached an example file with more than 4 coordinates which was produced by OCR4ALL.
0006.zip

maxnth · 2019-01-21T01:14:39Z

We should also keep in mind that a page can have more than one "TextRegion" element and that the Reading Order can play an important part in analyzing the extracted text.

maxnth self-assigned this Jan 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datenmodellierung #3

Datenmodellierung #3

blrtvs commented Nov 26, 2018

maxnth commented Jan 20, 2019

maxnth commented Jan 21, 2019

Datenmodellierung #3

Datenmodellierung #3

Comments

blrtvs commented Nov 26, 2018

maxnth commented Jan 20, 2019

maxnth commented Jan 21, 2019