Help Improving OCR of a document #637

adrianlfns · 2023-03-20T14:55:12Z

Hello,

I'm using your wrapper of tesseract OCR in order to extract text from a PDF. I uploaded here a sample of the PDF that I'm attempting to perform OCR. I also uploaded the trained data that I'm using. If you attempt to perform OCR to that Image, you will see a really bad quality text. Is there anything I could do to improve the OCR output??

thank you in advance

TrainedData.zip

charlesw · 2023-03-21T19:39:50Z

The example appears to be very low resolution scan. Tesseract doesn't perform well in these cases the recommended resolution is approx 300dpi. Good luck

…

On Tue, 21 Mar 2023, 01:55 Adrian, ***@***.***> wrote: Hello, I'm using your wrapper of tesseract OCR in order to extract text from a PDF. I uploaded here a sample of the PDF that I'm attempting to perform OCR. I also uploaded the trained data that I'm using. If you attempt to perform OCR to that Image, you will see a really bad quality text. Is there anything I could do to improve the OCR output?? thank you in advance TrainedData.zip <https://github.com/charlesw/tesseract/files/11019529/TrainedData.zip> [image: BadOCR_Image] <https://user-images.githubusercontent.com/7875120/226378033-b9a8e18e-08d3-4e86-84fa-d9d88b498566.JPG> — Reply to this email directly, view it on GitHub <#637>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB7HSBFBZBO5PN2ZYUA2I3W5BVVXANCNFSM6AAAAAAWBGM6B4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help Improving OCR of a document #637

Help Improving OCR of a document #637

adrianlfns commented Mar 20, 2023

charlesw commented Mar 21, 2023 via email

Help Improving OCR of a document #637

Help Improving OCR of a document #637

Comments

adrianlfns commented Mar 20, 2023

charlesw commented Mar 21, 2023 via email