-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pt] Improved/Fixed rules in grammar.xml and style.xml #11160
Conversation
WalkthroughThe pull request introduces modifications to the Portuguese language modules of LanguageTool, specifically updating the Changes
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml (1)
1645-1650
: Good example coverage, consider adding negative test cases.The new examples effectively demonstrate valid uses of "grande número" in various contexts. However, to ensure robust rule testing, consider adding negative test cases (examples that should trigger the rule).
Here's a suggested negative example to add:
<example correction='aglomerado'>O estudo detectou <marker>grandes números</marker> de galáxias distantes.</example>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml
(3 hunks)languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
(2 hunks)
🔇 Additional comments (5)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml (2)
1620-1622
: Well-structured antipattern enhancement!The regex modification to include both singular and plural forms (
leis?|propriedades?|teoremas?|teorias?
) is a good improvement. It correctly preserves mathematical and statistical terminology like "lei dos grandes números" (law of large numbers) from being flagged.
1633-1633
: Excellent POS tag-based exception handling!The updated exception with
postag_regexp='yes' postag='SENT_START|SPS00|V.+'
provides more precise control over rule application by considering grammatical context (sentence starts, prepositions, and verbs).languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml (3)
1513-1516
: Well-documented antipattern with clear examples!The added comment "outros/outras #1" improves code readability by clearly indicating the purpose of this antipattern. The pattern correctly handles both singular and plural forms of the expression.
1531-1536
: Well-structured new antipattern for pronoun-verb-noun agreement.The new antipattern effectively handles pronoun-verb-noun agreement cases. Consider adding more examples to demonstrate different gender combinations for better coverage.
Consider adding more examples to cover different gender combinations:
<example>A Maria achou que eu tinha lhe dito verdade.</example> <example>O João pensou que eu tinha lhe feito promessa.</example>✅ Verification successful
Postag patterns are consistent with existing rules
The verification shows that the new antipattern's postag patterns are consistent with existing usage in the codebase:
- The pronoun pattern
PP3C.+
is used in both grammar.xml and style.xml with similar regexp patterns- The verb pattern
VMP00.M
is widely used across the rules, particularly in similar gender-agreement contexts- The noun/adjective pattern
AQ.[CF].+|N.[CF].+
is consistently used throughout grammar.xml for similar agreement rulesThe suggested additional examples would indeed improve coverage, but the current implementation follows established patterns in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the consistency of postag patterns # Check for similar pronoun patterns echo "Checking pronoun patterns (PP3C.+):" rg "postag='PP3C.+'" languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/ # Check for similar verb patterns echo -e "\nChecking verb patterns (VMP00.M):" rg "postag='VMP00.M'" languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/ # Check for similar noun/adjective patterns echo -e "\nChecking noun/adjective patterns (AQ.[CF].+|N.[CF].+):" rg "postag='AQ.\[CF\].+\|N.\[CF\].+'" languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/Length of output: 11011
1522-1525
: Verify the complex postag pattern.The added comment "outros/outras #2" improves code clarity. However, let's verify the complex postag pattern
(SPS00:)?[DP]..[CF].+
:✅ Verification successful
Based on the search results, I can see that similar postag patterns are widely used throughout the grammar rules in the Portuguese language module. The pattern
(SPS00:)?[DP]..[CF].+
follows the same structure as other patterns in the codebase, particularly for handling gender-specific rules. For example:
postag='DA0F.+|SPS00:DA0F.+|PP.F.+|SPS00:PP.F.+'
postag='PP.FS.+|SPS00:PP.FS.+|DD0FS.+|SPS00:DD0FS.+'
postag='DI.FS.+|SPS00:D..FS.+|SPS00:...FS.+'
These patterns consistently use:
- Optional
SPS00:
prefix (preposition)[DP]
for determiners/pronouns[CF]
for gender (feminine/common).+
suffix for additional attributesLet's verify the specific components with another search:
Based on the search results, I can verify that the postag pattern
(SPS00:)?[DP]..[CF].+
is consistent with the codebase's usage. The pattern is used in similar forms throughout the Portuguese language module, particularly for handling gender-specific rules. Let's break down the evidence:
The
SPS00:
prefix is commonly used to match prepositions, as seen in many patterns like:
SPS00:DA0F.+
SPS00:D[AI].+
SPS00:[DP][ADIPR].+
The
[DP]
component for determiners/pronouns is used consistently:
[DP][ADIPR].+
[DP]..[CF].+
[DP][ADIP]0MS0
The
[CF]
gender pattern is used in similar contexts:
AQ.[CF].+|N.[CF].+
NC[CF].+
[DP]..[CF].+
The
.+
suffix is standard for matching additional attributes.The pattern follows the established conventions in the codebase for handling gender-specific rules in Portuguese grammar.
The postag pattern is correctly structured and consistent with the codebase's conventions.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the postag pattern usage in other rules # This helps ensure consistency and correctness of the pattern # Search for similar postag patterns in grammar rules rg -A 2 "postag='(SPS00:)?[DP]..[CF].+'" languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/Length of output: 37626
Script:
#!/bin/bash # Search for rules using similar gender patterns rg -A 2 'postag=.*\[DP\].*\[CF\]' languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/ # Search for rules using SPS00 prefix rg -A 2 'postag=.*SPS00:.*' languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/Length of output: 108825
Antipattern in a rule and fixes in other.
Summary by CodeRabbit
Release Notes
New Features
Improvements
Bug Fixes