Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run with font class priors #7

Open
bertsky opened this issue Mar 4, 2023 · 5 comments
Open

run with font class priors #7

bertsky opened this issue Mar 4, 2023 · 5 comments

Comments

@bertsky
Copy link
Contributor

bertsky commented Mar 4, 2023

It would be really nice if it was possible to constrain the font predictions to classes known in advance. This could be implemented in the OCR-D wrapper by suppressing certain results from the prediction, but ideally its passed to the neural network decoder so all the probability mass gets reassigned.

For example, if I know the document only contains Fraktur and Antiqua, or Hebrew and Greek, or Antiqua and Italic and Manuscript, or Gotico-Antiqua and Schwabacher, then I don't want to risk "surprise" outliers (or systematic misclassification as in the Greek-Italic example).

@GemCarr
Copy link
Collaborator

GemCarr commented Mar 13, 2023

This option can be easily added for the font classifier which could improve performance and will ensure these class are never in the predictions, so i will do that.
For COCR unfortunately there is no easy way to add it for now, as we don't have dedicated parts of the model for specific fonts that we could 'turn off'.

@GemCarr GemCarr closed this as completed Mar 13, 2023
@bertsky
Copy link
Contributor Author

bertsky commented Mar 13, 2023

For COCR unfortunately there is no easy way to add it for now, as we don't have dedicated parts of the model for specific fonts that we could 'turn off'.

Yes, I guess that would require changing the network of COCR, with an input-as-output scheme (i.e. representing the font as additional output dimension).

@GemCarr
Copy link
Collaborator

GemCarr commented Mar 13, 2023

Its also linked to the training process of the model, at the beginning we have dedicated modules specialized on specific fonts but then the whole network is fine tuned at once. The result is kind of an interlinked structure where these modules are not specialized on specific fonts anymore, but most probably mix of fonts etc.
Maybe the structure can be modified to ignore specific classes at a later point, i will investigate.

@bertsky
Copy link
Contributor Author

bertsky commented Mar 13, 2023

Its also linked to the training process of the model, at the beginning we have dedicated modules specialized on specific fonts but then the whole network is fine tuned at once. The result is kind of an interlinked structure where these modules are not specialized on specific fonts anymore, but most probably mix of fonts etc.

Oh, interesting. I do think this would still be compatible with an input-as-output extension. The network would simply (be forced to) learn to factor this in at every phase (perhaps with some custom regularizer).

Or you just add it as another (uninitialized) layer during the finetuning phase.

@GemCarr GemCarr reopened this Mar 14, 2023
@seuretm
Copy link

seuretm commented Mar 14, 2023

For now, we have tried once to modify the COCR architecture to also output font groups at character level, which partially (but not fully) enforced the different components to be specialized for different font groups. It had unfortunately a negative impact on the CER. Investigating this further is in our todo list, however it will require time, as training combined OCR models isn't that fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants