Incorrectly encoding text when backFormat is text #68

mwhapples · 2020-02-21T20:09:37Z

When performing backtranslation with file2brl, if the configuration has backFormat set to text and the text resulting from the backtranslation contains unicode characters outside the ASCII range these will be incorrectly encoded.
As an example, using en-ueb-g2.ctb as the translation table try back translating a word containing an apostrophe (eg. I'M, CAN'T, etc). This results in the apostrophe being produced as the byte 0x19.
Having tested file2brl with backFormat set to html, it appears that in this example the apostrophe gets backtranslated to unicode character \u2019. I therefore suspect file2brl is simply removing the higher byte of the unicode characters when backFormat is set to text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrectly encoding text when backFormat is text #68

Incorrectly encoding text when backFormat is text #68

mwhapples commented Feb 21, 2020

Incorrectly encoding text when backFormat is text #68

Incorrectly encoding text when backFormat is text #68

Comments

mwhapples commented Feb 21, 2020