Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Cantonese support, fixed some bugs. #11

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jckt
Copy link

@jckt jckt commented Jul 27, 2018

Added Cantonese pronunciation support

Main features added:

  • Cantonese support, including a new dictionary with Cantonese pingjam (Cantonese pinyin).
  • Integration of Cantonese pingjam into theme/colour display options.
  • Ability to limit number of entries shown in popup (partly because the new dictionary has many more entries, Cantonese-specific words/phrases, etc.).

Fixed bugs, notably:

  • Popup would not jump up when cursor is near bottom of window.
  • Conversion to simplified chars was incorrectly implemented (for example see here), and also some other issues with the string "source" and "target" used (for example, 沒 is the result of 冇, which is incorrect). The current implementation is less efficient but does not lead to incorrect/incomplete results.

jckt added 5 commits July 27, 2018 15:18
…g. Ctrl), but that modifier key is not in use by LiuChan, LiuChan would still try to interpret the key. For example, user wishes to copy (Ctrl+C) some text, does not have Ctrl as a modifier key. Then LiuChan would (before this fix) interpret the pressed C to copy the dictionary text, when really user expects the behaviour of Ctrl+C not to be changed.
@Paperfeed
Copy link
Owner

Paperfeed commented Jul 31, 2018

Which dictionary are you using for cantonese? Is it CC-Canto from http://cantonese.org/?

edit:
Thanks for the contribution by the way :)

@jckt
Copy link
Author

jckt commented Aug 1, 2018

The raw data comes from CC-Canto and the CC-CEDICT Cantonese readings (both from cantonese.org), these were processed into a single file.

You're welcome!

@gkovacs
Copy link
Contributor

gkovacs commented Sep 16, 2018

This is great! Tested it and it resolves an issue with words like 捨棄 failing to be looked up that has been constantly annoying me, is there anything blocking this from being merged?

gkovacs and others added 2 commits September 16, 2018 13:20
fix incorrect zhuyin - mo corresponds to ㄇㄛ (\u3107\u311b) not ㄇㄨㄛ (\u3107\u3128\u311b)
@gkovacs
Copy link
Contributor

gkovacs commented Oct 5, 2018

I noticed that with this branch jyutping seems to be unavailable for 律 and all words containing it ie 法律,律师,旋律,音律,因果律,定律,菲律宾 - I'm not sure why

@gkovacs
Copy link
Contributor

gkovacs commented Oct 5, 2018

Found the reason for the above error, it looks like the scripts that generates cedict_combined.u8 might have some bugs as it doesn't seem to include jyutping everywhere. See the below (jyutping should be between the { } )

法律 法律 [fa3 lu:4] { } /law/CL:條|条[tiao2], 套[tao4], 個|个[ge4]/

@gkovacs
Copy link
Contributor

gkovacs commented Oct 5, 2018

Oh this seems to impact every word containing a character that has pinyin pronunciation v (u:), like 女,绿,吕,驴. Presumably an issue with the script that generates cedict_combined.u8 (which unfortunately doesn't seem to be included in the repository)

@jckt
Copy link
Author

jckt commented Oct 6, 2018

I wrote a big message just now about how in general I've tried to avoid autocompleting jyutpings on a per-character basis (leads to many errors, even the Pleco dictionary on iPhone has it, which uses a better version of the CC-Canto sources AFAIK). But you're right, actually in this case it's my fault and that there is a bug in the generator scripts. In fact, the entry is double-entered; somewhere else in the file:
法律 法律 [fa3 lv4] {faat3 leot6}
So there's now two ways of expressing ü in the dictionary (I forget if this is a problem, I'll check again soon when I have the time). In this case I guess one could either condense the two entries (easy in this case since the entry above is deformed -- it as no / / field for a (blank) definition, so the regex just misses it completely (that's why it doesn't even show up as a definition-free entry). Or one can just leave the two entries but auto-clean the pinyins and / / definition field. I'll try to fix it as soon as I have the time.

For now, I've attached the dictionary generator scripts. I didn't include them in the branch since I thought I would quickly clean them up and include some autocomplete system that also gave correct results (but that's actually a much harder problem than I thought it was).

Thanks again for pointing this out.
generators.zip

@orientalperil
Copy link

@Paperfeed Any chance this can get merged and deployed to the Chrome Web Store? I'm interested in being able to use Cantonese and can help push this along if more changes are needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants