run with font class priors #7

bertsky · 2023-03-04T18:18:37Z

It would be really nice if it was possible to constrain the font predictions to classes known in advance. This could be implemented in the OCR-D wrapper by suppressing certain results from the prediction, but ideally its passed to the neural network decoder so all the probability mass gets reassigned.

For example, if I know the document only contains Fraktur and Antiqua, or Hebrew and Greek, or Antiqua and Italic and Manuscript, or Gotico-Antiqua and Schwabacher, then I don't want to risk "surprise" outliers (or systematic misclassification as in the Greek-Italic example).

GemCarr · 2023-03-13T12:53:40Z

This option can be easily added for the font classifier which could improve performance and will ensure these class are never in the predictions, so i will do that.
For COCR unfortunately there is no easy way to add it for now, as we don't have dedicated parts of the model for specific fonts that we could 'turn off'.

bertsky · 2023-03-13T16:52:11Z

For COCR unfortunately there is no easy way to add it for now, as we don't have dedicated parts of the model for specific fonts that we could 'turn off'.

Yes, I guess that would require changing the network of COCR, with an input-as-output scheme (i.e. representing the font as additional output dimension).

GemCarr · 2023-03-13T17:11:45Z

Its also linked to the training process of the model, at the beginning we have dedicated modules specialized on specific fonts but then the whole network is fine tuned at once. The result is kind of an interlinked structure where these modules are not specialized on specific fonts anymore, but most probably mix of fonts etc.
Maybe the structure can be modified to ignore specific classes at a later point, i will investigate.

bertsky · 2023-03-13T17:21:12Z

Its also linked to the training process of the model, at the beginning we have dedicated modules specialized on specific fonts but then the whole network is fine tuned at once. The result is kind of an interlinked structure where these modules are not specialized on specific fonts anymore, but most probably mix of fonts etc.

Oh, interesting. I do think this would still be compatible with an input-as-output extension. The network would simply (be forced to) learn to factor this in at every phase (perhaps with some custom regularizer).

Or you just add it as another (uninitialized) layer during the finetuning phase.

seuretm · 2023-03-14T09:37:39Z

For now, we have tried once to modify the COCR architecture to also output font groups at character level, which partially (but not fully) enforced the different components to be specialized for different font groups. It had unfortunately a negative impact on the CER. Investigating this further is in our todo list, however it will require time, as training combined OCR models isn't that fast.

bertsky mentioned this issue Mar 13, 2023

greek confused with italic #5

Open

GemCarr closed this as completed Mar 13, 2023

GemCarr reopened this Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run with font class priors #7

run with font class priors #7

bertsky commented Mar 4, 2023

GemCarr commented Mar 13, 2023 •

edited

Loading

bertsky commented Mar 13, 2023

GemCarr commented Mar 13, 2023 •

edited

Loading

bertsky commented Mar 13, 2023

seuretm commented Mar 14, 2023

run with font class priors #7

run with font class priors #7

Comments

bertsky commented Mar 4, 2023

GemCarr commented Mar 13, 2023 • edited Loading

bertsky commented Mar 13, 2023

GemCarr commented Mar 13, 2023 • edited Loading

bertsky commented Mar 13, 2023

seuretm commented Mar 14, 2023

GemCarr commented Mar 13, 2023 •

edited

Loading

GemCarr commented Mar 13, 2023 •

edited

Loading