Skip to content

Commit

Permalink
feat: mark invalid entries as fuzzy
Browse files Browse the repository at this point in the history
  • Loading branch information
OmarIthawi committed Oct 26, 2023
1 parent 39d2ed1 commit c4ed038
Show file tree
Hide file tree
Showing 6 changed files with 335 additions and 24 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/validate-translation-files.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ jobs:
message: |
:white_check_mark: All translation files are valid.
```
${{ steps.validate_translation_files.outputs.VALIDATION_ERRORS }}
```
This comment has been posted by the `validate-translation-files.yml` GitHub Actions workflow.
message-failure: |
Expand Down
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,14 @@ test_requirements: ## Installs test.txt requirements
test: ## Run scripts tests
pytest -v -s --cov=. --cov-report=term-missing --cov-report=html scripts/tests

validate_translation_files: ## Run basic validation to ensure files are compilable
validate_translation_files: ## Run basic validation to ensure translation files are valid
python scripts/validate_translation_files.py


validate_translation_files_mark_fuzzy: ## Run basic validation and mark invalid entries as fuzzy
python scripts/validate_translation_files.py --mark-fuzzy


sync_translations: ## Syncs from the old projects to the new openedx-translations project
python scripts/sync_translations.py $(SYNC_ARGS)

Expand Down
99 changes: 99 additions & 0 deletions docs/decisions/0002-mark-fuzzy-entries.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
Mark Transifex invalid entries as fuzzy
#######################################

Context
*******
As of the `OEP-58`_, the Transifex GitHub App is used to sync Translations.
These translations are validated by `LexiQA`_, built-in Transifex quality
checks, and human reviewers. During this synchronization process, the
`Transifex GitHub App`_ creates pull requests to this repository. The
translations in the pull requests are then validated by ``msgfmt`` in CI.

Problem
*******
There are times when the translations being synchronized fail ``msgfmt``
validation. This prevents the pull requests from being merged.


Design Decision
***************

A GitHub Actions workflow will be implemented to mark invalid entries in
synchronized ``.po`` files as ``fuzzy``. This will update pull requests
created by the `Transifex GitHub App`_.

To ensure a safe and reliable workflow, the following workflows will be
combined into one single workflow with multiple jobs:

#. Run ``python-tests.yml`` to validate the python code
#. Then, run ``validate-translation-files.yml`` which performs the following:

#. Validate the po-files using ``msgfmt``
#. Notify the translators about the invalid entries via the preferred
communication channel (Slack, Transifex, GitHub)
#. Edit the po files to mark invalid entries as ``fuzzy``, so it's
excluded from ``.mo`` files
#. Revalidate the files

#. Commit the fixes
#. Run the lint commits
#. Push to the branch
#. Trigger auto merge


Informing the Translators
*************************
Translators can be notified about invalid translations via Slack, Transifex
or GitHub issues depending on the community's preference.

Pros and Cons
*************

**Pros**

* New valid strings would make it into the ``.mo`` files
* There's no need for manual intervention, therefore it's fast and won't
create a backlog of pull requests.
* Rejected strings are easily identifiable by looking in the code, so it's
easy to fix them.
* Translators can be notified about invalid translations via Slack, Transifex,
GitHub depending on the community's preference.


**Cons**

* The workflow script runs and edits the pull request, which can be
confusing for the reviewers.
* Previously valid entries are going to be overwritten with new ``fuzzy``
strings which will make those entries to be shown in source language
(English).

Rejected Alternative: Keep pull requests open
*********************************************

- **Create a Transifex issue only:** It would be possible to identify the
faulty entries and create a Transifex issue via API to the faulty entries.
However, this would require attention from the translators
team, which can take a rather long time therefore creating a backlog of
invalid entries. Additionally, Transifex issues are not understood or used
by most of the Open edX community.

- **Post errors on Slack only:** Post the errors on Slack and ask the
translators
to fix them. Same as the previous option, not all translators are on Slack.
Additionally, this option would have a slow feedback loop causing the pull
requests backlog to build-up.

- **Mark faulty entries as unreviewed only:** Use the Transifex API to mark
the the invalid entries as unreviewed, then pull only
reviewed entries into this repository.
This option would require an extensive use the of the Transifex API,
and pull again which can be complex to implement. Additionally, this option
would have a slow feedback loop causing the pull requests backlog to
build-up.


.. _OEP-58: https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0058-arch-translations-management.html
.. _LexiQA: https://help.transifex.com/en/articles/6219179-lexiqa
.. _Transifex GitHub App: https://github.com/apps/transifex-integration

Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,51 @@ msgstr ""
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"Generated-By: Babel 2.8.0\n"


#: default_data.py:5
msgid ""
"An isosceles triangle with three layers of similar height. It is shown "
"upright, so the widest layer is located at the bottom, and the narrowest "
"layer is located at the top."
msgstr ""

#, python-brace-format
msgid "{action} invalid actions."
msgstr "{something_else} an invalid action."

#: default_data.py:20
msgid ""
"Use this zone to associate an item with the bottom layer of the triangle."
msgstr ""

#: default_data.py:22
msgid "Correct! This one belongs to The Top Zone."
msgstr ""

#, python-brace-format
msgid "{action} is not a valid action."
msgstr "{कार्रवाई} एक वैध कार्रवाई नहीं है।" # Invalid brace-format with encoding errors
msgid ""
"{action2} another invalid action."
msgstr "{कार्रवाई2} एक वैध कार्रवाई नहीं है।"

#: default_data.py:23
msgid "Correct! This one belongs to The Middle Zone."
msgstr ""

#: default_data.py:24
msgid "Correct! This one belongs to The Bottom Zone."
msgstr ""

#: default_data.py:25
msgid "No, this item does not belong here. Try again."
msgstr "नहीं, यह आइटम यहाँ नहीं है। पुनः प्रयास करें।"

#: default_data.py:26
msgid "You silly, there are no zones for this one."
msgstr "आप मूर्खतापूर्ण हैं, इसके लिए कोई क्षेत्र नहीं है।"

#, python-brace-format
msgid "{action3} one more time!"
msgstr ""
"{कार्रवाई3} एक वैध कार्रवाई नहीं है।"
""

58 changes: 53 additions & 5 deletions scripts/tests/test_validate_translation_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
"""

import os.path
import re
import shutil

from ..validate_translation_files import (
get_translation_files,
main,
validate_translation_files,
)

SCRIPT_DIR = os.path.dirname(__file__)
Expand Down Expand Up @@ -35,7 +37,7 @@ def test_main_on_invalid_files(capsys):
Integration test for the `main` function on some invalid files.
"""
mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir')
exit_code = main(mock_translations_dir)
exit_code = validate_translation_files(mock_translations_dir)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout'
Expand All @@ -44,8 +46,8 @@ def test_main_on_invalid_files(capsys):
assert 'hi/LC_MESSAGES/django.po' not in out, 'Invalid file should be printed in stderr'
assert 'en/LC_MESSAGES/django.po' not in out, 'Source file should not be validated'

assert 'INVALID:' in err
assert 'hi/LC_MESSAGES/django.po' in err
assert re.match(r'INVALID: .*hi/LC_MESSAGES/django.po', err)
assert 'Errors:' in err
assert '\'msgstr\' is not a valid Python brace format string, unlike \'msgid\'' in err
assert 'FAILURE: Some translations are invalid.' in err

Expand All @@ -57,11 +59,57 @@ def test_main_on_valid_files(capsys):
Integration test for the `main` function but only for the Arabic translations which is valid.
"""
mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir/demo-xblock/conf/locale/ar')
exit_code = main(mock_translations_dir)
exit_code = validate_translation_files(mock_translations_dir)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout'
assert 'ar/LC_MESSAGES/django.po' in out, 'Arabic translation file is valid'
assert 'SUCCESS: All translation files are valid.' in out
assert 'Errors:' not in out, 'There should be no errors'
assert not err.strip(), 'The stderr should be empty'
assert exit_code == 0, 'Should succeed due in validating the ar/LC_MESSAGES/django.po file'


def test_main_on_valid_files_with_mark_fuzzy(capsys, tmp_path):
"""
Test the `main` function with the --mark-fuzzy option.
"""
original_mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir')
mock_translations_dir = tmp_path / 'translations'
shutil.copytree(original_mock_translations_dir, mock_translations_dir)

exit_code = validate_translation_files(translations_dir=mock_translations_dir, mark_fuzzy=True)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout'
assert re.match(r'FIXED: .*hi/LC_MESSAGES/django.po', err), 'Should print the FIXED files in stderr'
assert 'ar/LC_MESSAGES/django.po' in out, 'Arabic translation file is valid regardless'
assert 'NOTICE: Some translations were fixed.' in err, 'Should print NOTICE in stderr'
assert 'SUCCESS: All translation files are now valid.' in err, 'Should print SUCCESS in stderr'
assert 'Previous errors:' in err, 'Original errors should be printed'
assert 'New output:' in err, 'New output should be printed after the fix'
assert exit_code == 0, (
'Should succeed even though the in/LC_MESSAGES/django.po is invalid, because it was marked as fuzzy'
)


def test_main_on_valid_files_with_mark_fuzzy_unfixable_files(capsys, tmp_path):
"""
Test the `main` function with the --mark-fuzzy option but the file is broken in an unknown way.
"""
original_mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir')
mock_translations_dir = tmp_path / 'translations'
shutil.copytree(original_mock_translations_dir, mock_translations_dir)
hi_language_file = mock_translations_dir / 'demo-xblock/conf/locale/hi/LC_MESSAGES/django.po'

with open(hi_language_file, 'a') as f:
f.write('<<<<Some random string to break the file>>>>')

exit_code = validate_translation_files(translations_dir=mock_translations_dir, mark_fuzzy=True)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout regardless'
assert re.match(r'INVALID: .*hi/LC_MESSAGES/django.po', err), 'Should print the Error files in stderr'
assert 'Previous errors:' in err, 'Original errors should be printed'
assert 'New errors:' in err, 'New errors should be printed after the fix'
assert exit_code == 1, 'Should fail due to unfixable hi/LC_MESSAGES/django.po file'
Loading

0 comments on commit c4ed038

Please sign in to comment.