-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add better support for Brazilian Portuguese #4302
Comments
Latest Tesseract with the model script/Latin gives a better result for the first image:
|
What is the config to get this result in portuguese? Is it "-l lat+script/Latin" or "-l por+script/Latin"? config_tesseract = fr'--tessdata-dir "{TESSDATA_PREFIX}" -l lat+script/Latin --oem 3 --psm 6' |
It's simply |
Note also that a correct installation of Tesseract does not need |
I did a test to OCR scanned documents in Brazilian Portuguese, and I saw that Tesseract makes a lot of mistakes on scanned documents in Portuguese
Current Behavior
result from https://huggingface.co/spaces/kneelesh48/Tesseract-OCR
Expected Behavior
Current Behavior
result from https://huggingface.co/spaces/kneelesh48/Tesseract-OCR
Expected Behavior
the correct thing would be
Windows 11
https://huggingface.co/spaces/kneelesh48/Tesseract-OCR
The text was updated successfully, but these errors were encountered: