You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Noticed that in most of HPLT documents that CLD2 says it is Uzbek and are written in cyrillic, fasttext is saying that sentences are other cyrillic langs like ru, kk, tt, ug, az. The list of possible cases is large so I this language may need a special mode where we simply check cyr and lat Uzbek dictionaries and if error is less than 30%, we keep it as uz.
Noticed that in most of HPLT documents that CLD2 says it is Uzbek and are written in cyrillic, fasttext is saying that sentences are other cyrillic langs like
ru
,kk
,tt
,ug
,az
. The list of possible cases is large so I this language may need a special mode where we simply check cyr and lat Uzbek dictionaries and if error is less than 30%, we keep it asuz
.There is one dictionary for both scripts here: https://github.com/u2b3k/uz-hunspell
The text was updated successfully, but these errors were encountered: