Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastSpell (yet another) next round #23

Open
10 of 21 tasks
mbanon opened this issue Oct 30, 2024 · 12 comments
Open
10 of 21 tasks

FastSpell (yet another) next round #23

mbanon opened this issue Oct 30, 2024 · 12 comments
Assignees

Comments

@mbanon
Copy link
Owner

mbanon commented Oct 30, 2024

Some candidates here (WIP):

@mbanon mbanon self-assigned this Oct 31, 2024
@mbanon
Copy link
Owner Author

mbanon commented Feb 10, 2025

  • Dzongkha vs Tibetan: not happening because dictionaries are actually syllabaries

@mbanon
Copy link
Owner Author

mbanon commented Feb 10, 2025

  • Guarani vs Spanish: added hunspell for Guarani,but still a lot of false es results .

Then:

    643 gn
    206 es
     27 pt
     ...

Now:

    656 gn
    193 es
     27 pt
     ...

(out of 1000 sentences from Flores-plus)

@mbanon
Copy link
Owner Author

mbanon commented Feb 10, 2025

  • Occitan vs Spanish vs Catalan: Very nice results. I used the Occitan dictionary that we already had for Catalan, not the linked above one.

Then:

    628 oc
    308 ca
     40 es
     16 fr
      3 ro
      2 pt

Now:

    976 oc
     16 fr
      3 ro
      2 pt

@mbanon
Copy link
Owner Author

mbanon commented Feb 11, 2025

  • Turkmen vs Turkish: Very nice results too. Used TK and TR dictionaries already in the dictionary pack.

Then:

    871 tk
    103 tr
      9 en
      5 et
      2 uz
      ...

Now;

    974 tk
      9 en
      5 et
      2 uz
      ...

@mbanon
Copy link
Owner Author

mbanon commented Feb 11, 2025

  • Maithili vs Hindi: also added.

Then:

    755 hi
    148 mai
     93 bh
      1 mr

Now:

    893 mai
     93 bh
     10 hi
      1 mr

@mbanon
Copy link
Owner Author

mbanon commented Feb 11, 2025

  • Sardinian: Added with Spanish, Catalan, Italian and Romanian. Might add French at some point if needed.

Then:

    252 es
    233 ca
    188 it
    127 ro
     54 fr
     32 oc
     26 pt
     11 tl
      9 sc

Now:

    809 sc
     54 fr
     32 oc
     26 pt
     11 tl

@mbanon
Copy link
Owner Author

mbanon commented Feb 11, 2025

  • Latgalian: Added LT and LV (also in the reversed direction, LV with LTG). Not perfect, but at least it's supported now.

Then:

    801 lv
    134 lt
     20 fi
     17 sl

Now:

    465 ltg
    405 lv
     65 lt
     20 fi

@mbanon
Copy link
Owner Author

mbanon commented Feb 11, 2025

  • Silesian vs Polish: Very cool!

Then:

    996 pl
      1 hr

Now:

    994 szl
      2 pl
      1 hr

@mbanon
Copy link
Owner Author

mbanon commented Feb 11, 2025

  • Papiamento: I added dictionaries from Aruba's Papiamento (Curaçao was also available but seemed derived from Aruba's one), result not perfect but at least Papiamento is supported now.

Then:

    549 es
    194 it
     36 pt
     26 ca
     ...

Now:

    633 pap
    103 es
     36 pt
     26 ca
     24 io
     ...

@mbanon
Copy link
Owner Author

mbanon commented Feb 12, 2025

  • Quechua: Tried with Quechua dictionaries from Ecuador. Not a great improvement. Not adding.

Then:

    779 qu
     97 es
     49 en
     29 pt
    ...

Now:

    781 qu
     95 es
     49 en
     29 pt

@mbanon
Copy link
Owner Author

mbanon commented Feb 12, 2025

  • Magahi: Not perfect, adding it anyway for Magahi support.

Then:

    994 hi
      2 bh
      1 ne

Now:

    542 hi
    452 mag
      2 bh
      1 ne

@mbanon
Copy link
Owner Author

mbanon commented Feb 12, 2025

  • Friulian: Added with Italian and French.

Then:

    609 it
    223 fr
     89 ro
     27 es
     11 pl
     10 eml
...

Now:

    832 fur
     89 ro
     27 es
     11 pl
     10 eml
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant