Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] Added and sorted alphabetically words #11103

Merged
merged 2 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,22 @@
# This does not add words to the spell checker, see hunspell/spelling.txt for that.
# File Encoding: UTF-8
# Format: three tab-separated fields: fullform baseform postags

#
conosco comigo PP1CPO00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Base form mapping for 'conosco' is incorrect

The entry conosco comigo PP1CPO00 is incorrect. From the entities file (misc.ent), we can see that 'conosco/connosco' and 'comigo' are different tonic pronouns. The tag PP1CPO00 indicates a first-person plural (CP) oblique (O) personal pronoun, while 'comigo' is first-person singular. The base form for 'conosco' should be 'nós' (the first-person plural pronoun) instead of 'comigo' (first-person singular).

🔗 Analysis chain

Verify the pronoun tag for 'conosco'

The entry conosco comigo PP1CPO00 maps 'conosco' to 'comigo' as its base form. This seems incorrect as they are different pronouns - 'conosco' means "with us" while 'comigo' means "with me".

Also applies to: 22-22


Let me gather more specific information about Portuguese pronouns in the dictionary.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other pronoun entries to verify the correct base form pattern
rg -A 1 "PP1CPO00|PP1CSO00" 

Length of output: 4189


Script:

#!/bin/bash
# Search for entries with 'conosco', 'comigo', 'connosco' and similar pronouns
rg "conosco|comigo|connosco" languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/

# Also check the morphological tag documentation
cat languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/tagset.txt || true

Length of output: 5427

diversas diverso DI0FP0
diversas diverso PI0FP000
diversos diverso DI0MP0
diversos diverso PI0MP000
conosco comigo PP1CPO00
embaixo embaixo RG
idem idem RG
mega mega RG
mesmíssima mesmíssimo DD0FS0
mesmíssimas mesmíssimo DD0FP0
mesmíssimo mesmíssimo DD0MS0
mesmíssimos mesmíssimo DD0MP0
ok ok RG
OK OK RG
Ok ok RG
ok ok RG
pouquíssima pouquíssimo DI0FS0
pouquíssimas pouquíssimo DI0FP0
pouquíssimo pouquíssimo DD0MS0
Expand All @@ -41,12 +41,11 @@ vcs vcs PP3CP000
# Add entries manually from here #
######################################################
######################################################

#
# problematic entries, to review

média média:2 NCMP000
ad ad NCCSS00
ads ad NCCPS00
Airbus Airbus NCCNO00
Alpha Alpha AQ0CS0
Alpha Alpha NCCS000
Alphas Alpha AQ0CP0
Expand All @@ -59,6 +58,7 @@ bolds bold NCCP000
bonehead bonehead NCCS000
boneheads bonehead NCCP000
bottom-up bottom-up NCCN000
BT BT AQ0CN0
bull bull NCMS000
C2 C2 NCMN000
card card NCMS000
Expand All @@ -81,15 +81,20 @@ dBmW dBmW AO0CN0
Dell Dell NCFSO00
demand demand NCFS000
demands demand NCFP000
desenrolar desenrolar NCMN000
direct direct NCMS000
directs directs NCMP000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent verb entries

The entries for 'direct' and 'directs' appear to be incorrectly tagged as nouns (NC). Since these are English verbs, they should either be removed or properly categorized if they're valid Portuguese terms.

-direct	direct	NCMS000
-directs	directs	NCMP000
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
desenrolar desenrolar NCMN000
direct direct NCMS000
directs directs NCMP000
desenrolar desenrolar NCMN000
🧰 Tools
🪛 LanguageTool

[duplication] ~84-~84: Possible typo: you repeated a word
Context: ...d NCFP000 desenrolar desenrolar NCMN000 direct direct NCMS000 directs directs NCMP000 Ed Ed N...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~85-~85: Possible typo: you repeated a word
Context: ...esenrolar NCMN000 direct direct NCMS000 directs directs NCMP000 Ed Ed NCMSS00 Eds Ed NCMPS00 EF...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~86-~86: Possible typo: you repeated a word
Context: ... direct NCMS000 directs directs NCMP000 Ed Ed NCMSS00 Eds Ed NCMPS00 EF EF NCCS000 ep...

(ENGLISH_WORD_REPEAT_RULE)

Ed Ed NCMSS00
Eds Ed NCMPS00
EF EF NCCS000
epóxi epóxi AQ0CN0
extras extra AQ0CP0
flag flag NCFS000
flags flag NCFP000
footnote footnote NCFS000
footnotes footnote NCFP000
friendly friendly AQ0CS0
frontman frontman NCMS000
FSR FSR AQ0CN0
gap gap NCMS000
gap gap NCCS000
Expand All @@ -101,10 +106,12 @@ gold gold NCMS000
health health NCFS000
height height NCFS000
heights height NCFP000
hidro hidro NCFS000
idle idle NCMN000
iFood iFood AQ0CN0
layer layer NCFS000
layers layer NCFP000
Linux Linux AQ0CN0
lossless lossless NCCN000
Margherita Margherita AQ0FS0
Margheritas Margherita AQ0FP0
Expand All @@ -115,26 +122,35 @@ marketplaces marketplace NCMP000
Marvel Marvel AQ0CN0
Mastercard Mastercard AQ0CN0
MasterCard MasterCard AQ0CN0
média média:2 NCMP000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix malformed tag in média entry

The entry média média:2 NCMP000 contains a malformed base form with a :2 suffix, which doesn't follow the established format.

-média	média:2	NCMP000
+média	média	NCMP000
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
média média:2 NCMP000
média média NCMP000
🧰 Tools
🪛 LanguageTool

[duplication] ~125-~125: Possible typo: you repeated a word
Context: ...MasterCard AQ0CN0 média média:2 NCMP000 mirrorless mirrorless AQ0CN0 oper oper NCCS000 opers oper NCC...

(ENGLISH_WORD_REPEAT_RULE)

mirrorless mirrorless AQ0CN0
oper oper NCCS000
opers oper NCCP000
Ophiuchus Ophiuchus AQ0CN0
performer performer NCCFS00
pet pet AQ0CN0
pet pet NCMS000
pets pets NCMP000
pix pix AQ0CN0
pix pix NCMS000
Play Play NCMS000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove trailing spaces from entries

Several entries contain trailing spaces that should be removed for consistency:

  • Line 136: popular
  • Lines 190-197: Multiple entries including todinha , todinhas , todinho , todinhos

These trailing spaces could cause issues in text processing.

Also applies to: 190-197

🧰 Tools
🪛 LanguageTool

[duplication] ~136-~136: Possible typo: you repeated a word
Context: ...Q0CN0 pix pix NCMS000 Play Play NCMS000 PlayArte PlayArte AQ0CN0 podere podere NCMSV00 popular p...

(ENGLISH_WORD_REPEAT_RULE)

PlayArte PlayArte AQ0CN0
podere podere NCMSV00
popular popular NCCS000
prime prime AQ0CN0
Profa Profa NCFS000
PrtSc PrtSc NCCS000
Puma Puma NCCSO00
quentinho quentinho NCMS000
quentinhos quentinho NCMP000
R$ R$ NCCP000
R$ R$ NCCS000
R$ R$ NCMP000
Comment on lines +145 to 148
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate duplicate currency symbol entries

The entries for 'R$' are duplicated with different number/gender tags. Consider consolidating these into a single entry with an appropriate tag for currency symbols.

-R$	R$	NCCP000
-R$	R$	NCCS000
-R$	R$	NCMP000
-R$	R$	NCMS000
+R$	R$	Zm
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
quentinhos quentinho NCMP000
R$ R$ NCCP000
R$ R$ NCCS000
R$ R$ NCMP000
quentinhos quentinho NCMP000
R$ R$ Zm
🧰 Tools
🪛 LanguageTool

[duplication] ~145-~145: Possible typo: you repeated a word
Context: ...ho NCMS000 quentinhos quentinho NCMP000 R$ R$ NCCP000 R$ R$ NCCS000 R$ R$ NCMP000 R$ ...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~146-~146: Possible typo: you repeated a word
Context: ...ntinhos quentinho NCMP000 R$ R$ NCCP000 R$ R$ NCCS000 R$ R$ NCMP000 R$ R$ NCMS000 Reb...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~147-~147: Possible typo: you repeated a word
Context: ...nho NCMP000 R$ R$ NCCP000 R$ R$ NCCS000 R$ R$ NCMP000 R$ R$ NCMS000 Rebook Rebook NCC...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~148-~148: Possible typo: you repeated a word
Context: ... R$ NCCP000 R$ R$ NCCS000 R$ R$ NCMP000 R$ R$ NCMS000 Rebook Rebook NCCSO00 Red Red N...

(ENGLISH_WORD_REPEAT_RULE)

R$ R$ NCMS000
Rebook Rebook NCCSO00
Red Red NCMN000
rockstar rockstar NCCS000
rockstars rockstar NCCP000
S S AO0CN0
s s AO0CN0
Samsung Samsung AQ0CN0
Expand All @@ -148,6 +164,8 @@ Simaria Simaria NCFSS00
Simarias Simaria NCFPS00
sis sis NCFS000
Skechers Skechers NCCSO00
spalla spalla NCFS000
spallas spalla NCFP000
Sr Sr NCMS000
Sra Sra NCFS000
Sras Sra NCFP000
Expand All @@ -160,14 +178,14 @@ tá tá VMIP2S0
tá tá VMIP3S0
tá tá VMN0000
tão tá VMIP3P0
tô tá VMIP1S0
Telecom Telecom NCFSO00
thrasher thrasher NCCS000
thrashers thrasher NCCP000
threshold threshold NCMS000
thresholds threshold NCMP000
tip tip NCFS000
tips tip NCFP000
tô tá VMIP1S0
todinha todo NCFS00D
todinha todo NCFS00D
todinhas todo NCFP00D
Expand All @@ -192,25 +210,7 @@ whiskas whiskas NCFP000
width width NCFS000
widths width NCFP000
yes-man yes-man NCMS000
Ophiuchus Ophiuchus AQ0CN0
mirrorless mirrorless AQ0CN0
desenrolar desenrolar NCMN000
hidro hidro NCFS000
quentinho quentinho NCMS000
quentinhos quentinho NCMP000
rockstar rockstar NCCS000
rockstars rockstar NCCP000
frontman frontman NCMS000
Play Play NCMS000
spalla spalla NCFS000
spallas spalla NCFP000
Profa Profa NCFS000
Airbus Airbus NCCNO00
Ed Ed NCMSS00
Eds Ed NCMPS00
EF EF NCCS000
BT BT AQ0CN0
Linux Linux AQ0CN0


# Added by Marco A.G.Pinto. Pedro told me to add here until he fixes the dictionary building.
autoinflingir autoinflingir VMN0000
Expand Down Expand Up @@ -252,5 +252,6 @@ Tolstoy Tolstoy NPMS000
Tolstoy Tolstoy NPMSS00
videogravação videogravação NCFS000
videogravações videogravação NCFP000
Whirlpool Whirlpool NPFS000
Woodcock Woodcock NPMS000
WritingTool WritingTool NPMN000
Loading
Loading