We found that Liv4ever-MT has been underestimated due to Unicode inconsistency problem. We detailed this problem in our system description paper. Here we provide the scripts to reproduce the experiments.
By the way, it is easy to normalize Unicode with python (norm_unicode.py).
Dependency
python3 -m pip install -U prettytable
Normalize references to NFKC
sh norm-ref.sh
Download Liv4ever-MT model
See Preparation section in the main README.md.
Generate translations
sh gen.sh
Evaluate
python3 score.py
Outputs:
+---------------------------------------------------------------------------------------------------------------------------+
| Before normalizing references to NFKC |
+-------------+-------+-------+--------+-------+-------+--------+-------+-------+--------+--------+--------+--------+-------+
| subset | et-en | lv-en | liv-en | en-et | lv-et | liv-et | en-lv | et-lv | liv-lv | en-liv | et-liv | lv-liv | avg. |
+-------------+-------+-------+--------+-------+-------+--------+-------+-------+--------+--------+--------+--------+-------+
| Full | 25.90 | 17.94 | 18.90 | 19.28 | 22.31 | 22.86 | 20.20 | 23.31 | 24.88 | 10.90 | 16.62 | 17.69 | 20.07 |
| Facebook | 28.43 | 13.95 | 19.44 | 25.32 | 22.97 | 24.89 | 26.60 | 28.21 | 33.14 | 13.93 | 19.26 | 21.23 | 23.11 |
| livones.net | 25.80 | 18.85 | 19.73 | 21.05 | 20.54 | 18.39 | 26.98 | 30.16 | 29.73 | 15.09 | 19.93 | 23.82 | 22.51 |
| dictionary | 16.06 | 12.12 | 7.94 | 13.30 | 25.27 | 36.73 | 7.89 | 28.63 | 26.75 | 10.61 | 32.01 | 28.51 | 20.48 |
| trilium | 32.36 | 17.93 | 18.89 | 27.02 | 28.66 | 26.71 | 21.08 | 30.79 | 27.78 | 14.42 | 20.59 | 20.00 | 23.85 |
| stalte | 21.86 | 12.59 | 13.81 | 12.69 | 24.83 | 29.34 | 10.86 | 24.53 | 31.84 | 9.38 | 25.25 | 24.63 | 20.14 |
| esuka | 14.94 | 24.31 | 7.26 | 11.15 | 11.21 | 14.67 | 32.31 | 13.71 | 7.58 | 5.15 | 10.40 | 5.71 | 13.20 |
| satversme | 27.50 | 19.77 | 24.68 | 16.69 | 20.22 | 18.68 | 16.05 | 15.10 | 19.38 | 7.58 | 7.18 | 9.23 | 16.84 |
+-------------+-------+-------+--------+-------+-------+--------+-------+-------+--------+--------+--------+--------+-------+
+---------------------------------------------------------------------------------------------------------------------------+
| After normalizing references to NFKC |
+-------------+-------+-------+--------+-------+-------+--------+-------+-------+--------+--------+--------+--------+-------+
| subset | et-en | lv-en | liv-en | en-et | lv-et | liv-et | en-lv | et-lv | liv-lv | en-liv | et-liv | lv-liv | avg. |
+-------------+-------+-------+--------+-------+-------+--------+-------+-------+--------+--------+--------+--------+-------+
| Full | 26.20 | 18.06 | 19.26 | 20.72 | 24.28 | 24.42 | 24.10 | 27.77 | 29.33 | 14.31 | 20.51 | 22.35 | 22.61 |
| Facebook | 28.43 | 13.95 | 19.44 | 25.32 | 22.97 | 24.89 | 26.60 | 28.21 | 33.14 | 13.93 | 19.26 | 21.23 | 23.11 |
| livones.net | 25.80 | 18.85 | 19.73 | 21.05 | 20.54 | 18.39 | 26.98 | 30.16 | 29.73 | 15.09 | 19.93 | 23.82 | 22.51 |
| dictionary | 16.06 | 12.12 | 7.94 | 13.30 | 25.27 | 36.73 | 7.89 | 28.63 | 26.75 | 10.61 | 32.01 | 28.51 | 20.48 |
| trilium | 32.36 | 17.93 | 18.89 | 27.02 | 28.66 | 26.71 | 21.08 | 30.79 | 27.78 | 14.42 | 20.59 | 20.00 | 23.85 |
| stalte | 21.86 | 12.59 | 13.81 | 12.69 | 24.83 | 29.34 | 10.86 | 24.53 | 31.84 | 9.38 | 25.25 | 24.63 | 20.14 |
| esuka | 14.94 | 24.31 | 7.26 | 11.15 | 11.21 | 14.67 | 32.31 | 13.71 | 7.58 | 5.15 | 10.40 | 5.71 | 13.20 |
| satversme | 28.45 | 20.21 | 25.76 | 21.41 | 26.74 | 23.75 | 29.10 | 29.82 | 33.56 | 18.23 | 19.87 | 24.15 | 25.09 |
+-------------+-------+-------+--------+-------+-------+--------+-------+-------+--------+--------+--------+--------+-------+