This project aims to generate a Python module which provides translations for
the Unicode descriptions found in the
unicodedata
module. The source
of the translations is unicode-table.com which has
its source code at
GitHub. From this, PO and
MO files are generated by this project.
Note: these are also useful for other programming languages. An overview of supported language can be found here.
This localization has been discussed in:
- https://bugs.python.org/issue34053
- https://mail.python.org/pipermail/python-ideas/2018-July/051889.html
- https://groups.google.com/forum/#!topic/python-ideas/g2jj4WRVDFA
Install the following packages
sudo apt-get install wget unzip python3 gettext
In order to generate the files needed for a Python module with translations of Unicode descriptions, run
./1-clean.sh
which will remove previous generations. Then run
./2-download.sh
to download the translations in master.zip
. These are unzipped with
./3-extract.sh
into the directory unicode-table-data-master
. The Python script
./4-generate.py
will generate PO files in a tree in the directory locale
, such as
cn
LC_MESSAGES
de
LC_MESSAGES
fr
LC_MESSAGES
- ...
This script will also write log messages on information, warnings and errors to the command line. Note that languages are skipped if less than 1% has been translated or 10% of the translations identical to the original text.
Also, warnings are show when source texts are identical. This happens for
<Control>
and many ideographs and needs to be looked at further as the source
texts need to be unique for PO files.
The PO files can be converted to MO files by running
./5-convert.sh
This results in the following files in the directory locale
cn
LC_MESSAGES
symbols.po
symbols.mo
de
LC_MESSAGES
symbols.po
symbols.mo
fr
LC_MESSAGES
symbols.po
symbols.mo
- ...
The files in locale
can be packaged and distributed via e.g. PyPI or
eventually become part of the Python distribution. Note that this
localization can also be used for other programming languages.
The copyright of the translated strings can be found at unicode-table.com. The copyright of the scripts here is public domain.