You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using Earwig's Copyvio Detector in lab with a Hebrew text PDF resulted in reversed order of characters within words, e.g the correct text is more or less [word[::-1] for word in words] :)
For ease of debug: Even Latin script (URLs / emails) may appear as reversed within this PDF.
Relevant code (PDF parser, using pdfminer):
earwigbot/wiki/copyvios/parsers.py
This may be a upstream issue in pdfminer, or something wrong with the decoding.
The text was updated successfully, but these errors were encountered:
Using Earwig's Copyvio Detector in lab with a Hebrew text PDF resulted in reversed order of characters within words, e.g the correct text is more or less [word[::-1] for word in words] :)
For ease of debug: Even Latin script (URLs / emails) may appear as reversed within this PDF.
Input PDF: http://img2.tapuz.co.il/CommunaFiles/53173603.pdf
(query:
http://tools.wmflabs.org/copyvios/?lang=he&project=wikipedia&title=%D7%A0%D7%99%D7%AA%D7%95%D7%97+%D7%91%D7%A8%D7%99%D7%90%D7%98%D7%A8%D7%99&oldid=&use_engine=0&use_links=0&turnitin=0&action=compare&url=http%3A%2F%2Fimg2.tapuz.co.il%2FCommunaFiles%2F53173603.pdf
)
Relevant code (PDF parser, using pdfminer):
earwigbot/wiki/copyvios/parsers.py
This may be a upstream issue in pdfminer, or something wrong with the decoding.
The text was updated successfully, but these errors were encountered: