Skip to content

Commit

Permalink
feat: mark invalid entries as fuzzy
Browse files Browse the repository at this point in the history
  • Loading branch information
OmarIthawi committed Oct 27, 2023
1 parent bd63605 commit 2337147
Show file tree
Hide file tree
Showing 8 changed files with 458 additions and 111 deletions.
130 changes: 130 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Run pull request tests and validation for openedx-translations

name: CI

on:
- pull_request

jobs:
# Run unit and integration tests for Python
validate-po-files: # TODO: rename to `tests` and update the `main` branch protection rule
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
statuses: write
steps:
- name: clone openedx/openedx-translations
uses: actions/checkout@v3

- name: Install gettext
run: sudo apt install -y gettext

- name: setup python
uses: actions/setup-python@v4
with:
python-version: '3.8'

- name: Run python tests
run: |
make test_requirements
make test
- name: Allow editing translation files for bot pull requests
env:
# secrets can't be used in job conditionals, so we set them to env here
TRANSIFEX_APP_ACTOR_NAME: "${{ secrets.TRANSIFEX_APP_ACTOR_NAME }}"
TRANSIFEX_APP_ACTOR_ID: "${{ secrets.TRANSIFEX_APP_ACTOR_ID }}"
if: "${{ github.actor == env.TRANSIFEX_APP_ACTOR_NAME && github.actor_id == env.TRANSIFEX_APP_ACTOR_ID }}"
run: |
echo "VALIDATION_OPTIONS=--mark-fuzzy" >> "$GITHUB_ENV"
- name: Validate translation files
id: validate_translation_files
run: |
has_errors=0
python scripts/validate_translation_files.py $VALIDATION_OPTIONS 2>error_log.txt || has_errors=1
cat error_log.txt # Print the errors to the console for debugging
# Save the validation errors to an output variable
{
echo 'ERROR_LOG<<EOF'
fold -w 100 -s error_log.txt
echo EOF
} >> "$GITHUB_OUTPUT"
exit $has_errors
- name: Commit fixes to git
id: commit_fixes
if: ${{ github.event.repository.full_name == github.event.pull_request.head.repo.full_name }}
run: |
# Commit if there are changes to translation files
if ! git diff --no-ext-diff --quiet --exit-code; then
# Set the git user to the bot user to enable commit
git config --global user.email "[email protected]"
git config --global user.name "edx-transifex-bot"
# Switch from the merge commit to the pull request branch to enable push
git checkout "${{ github.head_ref }}"
git add translations/
git commit --message "fix: mark invalid entries as fuzzy" translations/
git push
echo "FUZZY_FIX_COMMIT_SHA=$(git rev-parse HEAD)" >> "$GITHUB_OUTPUT"
else
echo "No changes to commit"
fi
- name: Allow merging the fuzzy fix commit
uses: actions/github-script@v6
if: ${{ steps.commit_fixes.outputs.FUZZY_FIX_COMMIT_SHA }}
with:
script: |
await github.rest.repos.createCommitStatus({
context: 'validate-po-files', // TODO: rename to `tests` and update the `main` branch protection rule
description: 'ci.yml: Forced success status.',
owner: context.repo.owner,
repo: context.repo.repo,
sha: '${{ steps.commit_fixes.outputs.FUZZY_FIX_COMMIT_SHA }}',
state: 'success',
})
- name: Post translation validation results as a pull request comment
# Due to GitHub Actions security reasons posting a comment isn't possible on fork pull requests.
# This shouldn't be an issue, because bots writes directly to this repository.
if: ${{ always() && github.event.repository.full_name == github.event.pull_request.head.repo.full_name }}
uses: mshick/add-pr-comment@7c0890544fb33b0bdd2e59467fbacb62e028a096
with:
message: |
:white_check_mark: All translation files are valid.
```
${{ steps.validate_translation_files.outputs.ERROR_LOG || 'No errors were reported.' }}
```
This comment has been posted by the `validate-translation-files.yml` GitHub Actions workflow.
message-failure: |
:warning: There are errors in the translation files:
```
${{ steps.validate_translation_files.outputs.ERROR_LOG || 'No errors were reported.' }}
```
This comment has been posted by the `validate-translation-files.yml` GitHub Actions workflow.
- name: Auto-merge pull request
env:
# secrets can't be used in job conditionals, so we set them to env here
TRANSIFEX_APP_ACTOR_NAME: "${{ secrets.TRANSIFEX_APP_ACTOR_NAME }}"
TRANSIFEX_APP_ACTOR_ID: "${{ secrets.TRANSIFEX_APP_ACTOR_ID }}"
# This token requires Write access to the openedx-translations repo
GITHUB_TOKEN: ${{ secrets.EDX_TRANSIFEX_BOT_GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.number }}
if: "${{ github.actor == env.TRANSIFEX_APP_ACTOR_NAME && github.actor_id == env.TRANSIFEX_APP_ACTOR_ID }}"
run: |
# Add the pull request to the merge queue with --rebase commit strategy
gh pr merge ${{ github.head_ref }} --rebase --auto
31 changes: 0 additions & 31 deletions .github/workflows/python-tests.yml

This file was deleted.

56 changes: 0 additions & 56 deletions .github/workflows/validate-translation-files.yml

This file was deleted.

7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,14 @@ test_requirements: ## Installs test.txt requirements
test: ## Run scripts tests
pytest -v -s --cov=. --cov-report=term-missing --cov-report=html scripts/tests

validate_translation_files: ## Run basic validation to ensure files are compilable
validate_translation_files: ## Run basic validation to ensure translation files are valid
python scripts/validate_translation_files.py


validate_translation_files_mark_fuzzy: ## Run basic validation and mark invalid entries as fuzzy
python scripts/validate_translation_files.py --mark-fuzzy


sync_translations: ## Syncs from the old projects to the new openedx-translations project
python scripts/sync_translations.py $(SYNC_ARGS)

Expand Down
96 changes: 96 additions & 0 deletions docs/decisions/0002-mark-fuzzy-entries.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
Mark invalid entries in po files as fuzzy
#########################################

Context
*******
As of the `OEP-58`_, the Transifex GitHub App is used to sync Translations.
These translations are validated by `LexiQA`_, built-in Transifex quality
checks, and human reviewers. During this synchronization process, the
`Transifex GitHub App`_ creates pull requests to this repository. The
translations in the pull requests are then validated by ``msgfmt`` in CI.

Problem
*******
There are times when the translations being synchronized fail ``msgfmt``
validation. This prevents the pull requests from being merged.


Design Decision
***************

A GitHub Actions workflow will be implemented to mark invalid entries in
synchronized ``.po`` files as ``fuzzy``. This will update pull requests
created by the `Transifex GitHub App`_.

To ensure a safe and reliable workflow, the following workflows will be
combined into one single workflow with multiple jobs:

#. Run ``python-tests.yml`` to validate the python code
#. Then, run ``validate-translation-files.yml`` which performs the following:

#. Validate the po-files using ``msgfmt``
#. Notify the translators about the invalid entries via the preferred
communication channel (Slack, Transifex, GitHub)
#. Edit the po files to mark invalid entries as ``fuzzy``, so it's
excluded from ``.mo`` files
#. Revalidate the files

#. Commit the updated entries and push to the PR branch
#. Automatically merge the PR


Informing the Translators
*************************
Translators can be notified about invalid translations via Slack, Transifex
or GitHub issues depending on the community's preference.

Pros and Cons
*************

**Pros**

* New valid strings would make it into the ``.mo`` files
* There's no need for manual intervention, therefore it's fast and won't
create a backlog of pull requests.
* Rejected strings are easily identifiable by looking in the code, so it's
easy to fix them.
* Translators can be notified about invalid translations via Slack, Transifex,
GitHub depending on the community's preference.


**Cons**

* The workflow script runs and edits the pull request, which can be
confusing for the reviewers.
* Previously valid entries are going to be overwritten with new ``fuzzy``
strings which will make those entries to be shown in source language
(English).

Rejected Alternative: Keep pull requests open
*********************************************

- **Create a Transifex issue only:** It would be possible to identify the
faulty entries and create a Transifex issue via API to the faulty entries.
However, this would require attention from the translators
team, which can take a rather long time therefore creating a backlog of
invalid entries. Additionally, Transifex issues are not understood or used
by most of the Open edX community.

- **Post errors on Slack only:** Post the errors on Slack and ask the
translators
to fix them. Same as the previous option, not all translators are on Slack.
Additionally, this option would have a slow feedback loop causing the pull
requests backlog to build-up.

- **Mark faulty entries as unreviewed only:** Use the Transifex API to mark
the the invalid entries as unreviewed, then pull only
reviewed entries into this repository.
This option would require an extensive use the of the Transifex API,
and pull again which can be complex to implement. Additionally, this option
would have a slow feedback loop causing the pull requests backlog to
build-up.


.. _OEP-58: https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0058-arch-translations-management.html
.. _LexiQA: https://help.transifex.com/en/articles/6219179-lexiqa
.. _Transifex GitHub App: https://github.com/apps/transifex-integration
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,51 @@ msgstr ""
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"Generated-By: Babel 2.8.0\n"


#: default_data.py:5
msgid ""
"An isosceles triangle with three layers of similar height. It is shown "
"upright, so the widest layer is located at the bottom, and the narrowest "
"layer is located at the top."
msgstr ""

#, python-brace-format
msgid "{action} invalid actions."
msgstr "{something_else} an invalid action."

#: default_data.py:20
msgid ""
"Use this zone to associate an item with the bottom layer of the triangle."
msgstr ""

#: default_data.py:22
msgid "Correct! This one belongs to The Top Zone."
msgstr ""

#, python-brace-format
msgid "{action} is not a valid action."
msgstr "{कार्रवाई} एक वैध कार्रवाई नहीं है।" # Invalid brace-format with encoding errors
msgid ""
"{action2} another invalid action."
msgstr "{कार्रवाई2} एक वैध कार्रवाई नहीं है।"

#: default_data.py:23
msgid "Correct! This one belongs to The Middle Zone."
msgstr ""

#: default_data.py:24
msgid "Correct! This one belongs to The Bottom Zone."
msgstr ""

#: default_data.py:25
msgid "No, this item does not belong here. Try again."
msgstr "नहीं, यह आइटम यहाँ नहीं है। पुनः प्रयास करें।"

#: default_data.py:26
msgid "You silly, there are no zones for this one."
msgstr "आप मूर्खतापूर्ण हैं, इसके लिए कोई क्षेत्र नहीं है।"

#, python-brace-format
msgid "{action3} one more time!"
msgstr ""
"{कार्रवाई3} एक वैध कार्रवाई नहीं है।"
""

Loading

0 comments on commit 2337147

Please sign in to comment.