-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pt] Use new POS tagging schema #10375
Conversation
- as per premium issue #7032; - instead of defining tokenising characters, we define *word* characters; - strings of non-word chars are tokenised individually.
…CORDANCIA_COM_NUCLEO_DO_SUJEITO_V2[2], REFLEXIVE_VERB_SE_AGREEMENT
4a9b030
to
e26fb73
Compare
@marcoagpinto I'm pinging you here so you're aware of it. After this PR goes through, you should be able to go back to editing rules and the dictionary freely. Thank you for your patience! |
Heya! Thank you for letting me know. I missed coding rules so much 😢 😢 😢 😢 😢 @ricardojosehlima |
@susanaboatto I hope you've had a chance to have a look. Since it is Friday, I want to wait until Monday to merge this to avoid surprises over the weekend. |
v0.13
or later!tl;dr
This PR adds POS tags to the
pt
tagset for enclitic pronouns.What changes
Let's use
diz-me
('tell me') to illustrate the changes.diz
,-
,me
)diz-me
)dizer[VMM02S0:PP1CSO00]
Consequences
ama-se
<=>amasse
);amá
andlo
fromamá-lo
should now both be flagged as spelling errors);fá-lo-á
doesn't require five tokens, three of which shouldn't exist in isolation!);Also in this PR
EnclisisFilter
andProclisisFilter
to make synthesising verb forms with pronouns easier.