-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
15 changed files
with
116 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Changelog | ||
|
||
All notable changes to this project will be documented in this file. | ||
|
||
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) | ||
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). | ||
|
||
## [0.1.0] - 20180425 | ||
### Added | ||
* initial release |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,54 +1,45 @@ | ||
# Tibetan Phonetics Engine | ||
|
||
The goal of this repository is to: | ||
- provide an engine to interpret Tibetan in various phonetic transcription schemes | ||
- implement the rules in [Tournadre](http://www.worldcat.org/oclc/916715611) (intro and Ann. 2) as a starting point | ||
- implement the Chinese transcription letting a speaker of Mandarin pronounces or chants Tibetan scriptures | ||
|
||
## Description | ||
|
||
Ideally the engine will solely use configuration files, so that it can be phonetic scheme agnostic (no phonetics hardcoded). | ||
The goal of this code is to provide a library to: | ||
- implement the conversion of a Tibetan Unicode word into IPA, according to different schemes / dialects | ||
- implement some conversions between IPA and phonetics readable by people with various language backgrounds (Chinese, English, etc.) | ||
|
||
The various steps (for the Tournadre scheme, which is the most complex) will be: | ||
- Tibetan unicode -> Phonological scheme (given in Tournadre) | ||
- Phonological scheme -> IPA (according to Annex 2 of Tournadre) | ||
- IPA -> phonetic scheme | ||
The primary focus of this library is litterary pronounciation, ideally representing how an umze would pronounce a traditional text, but contributions for other uses are welcome. We also do not handle Sanskrit transliteration (this can be done through custom exceptions lists). | ||
|
||
The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed). | ||
Note that this library integrates no segmenter and needs to be applied on each word separately. You can use it in combination with [pybo](https://github.com/Esukhia/pybo/) to get the phonetics of full sentences. | ||
|
||
## Phonetics methods | ||
|
||
We currently provide two phonetics schemes: | ||
|
||
#### Manual of Standard Tibetan (by Tournadre) | ||
|
||
#### Colloquial Amdo Tibetan (by Kuo-ming Sung and Lha Byams Rgyal) | ||
|
||
## Outputs | ||
|
||
In brief, the generated IPA brings the Zhuyin or Bopomofo (Chinese transliteration system for Taiwanese Mandarin), which in turn provides the appropriate and chosen traditional Chinese sinograms. | ||
Apart from raw IPA, we provide the following output possibilities: | ||
|
||
We focus exclusively on litterary pronounciation, and have options for reading pronounciation or oral pronounciation. Our focus is to be able express how an umze would pronounce a traditional text. | ||
#### Chinese phonetics | ||
|
||
The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed). | ||
|
||
To produce the final output, we first transform the generated IPA into [Zhuyin](https://en.wikipedia.org/wiki/Bopomofo), and then the Zhuyin into Traditional Chinese characters, with a manually built correspondance list. | ||
|
||
## Installation | ||
|
||
## Running | ||
|
||
## TODO | ||
|
||
- study behavior for ambiguous syllables (probably list some as exceptions) | ||
- document kh¨antr¨as | ||
- footnote 200 p. 441 | ||
- dbu 'khyud | ||
- implement p. 36 | ||
- long aspirations (lhod lhod in one big aspiration) | ||
- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother") | ||
- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm) | ||
- test ཡར་འབྲོ/Co,y-a:m|~tr- | ||
- test ཐིག་ལེ | ||
- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos | ||
- option of word separation for exceptions, so that another syllable can be at position 1 | ||
- option for p. 432, note 196, aspirated consonnants on second syllables | ||
- དགོན་པ: p. 442, note 201, do something about it? gø~ pa | ||
- indicate weak pronounciation of n and ng? | ||
- geminates: impact on a/schwa? | ||
- stop after m suffix | ||
- phrase accents: -རྔ་སུ་རེད། and -རྔ་ང་རེད། : རྔ pronounced as ་ང | ||
- ཀུན་དགའ < test n -> ng | ||
- བར་ཆད -> war cho after another word? | ||
- amdokä: p.39, optional rise from ang to eng | ||
- idem p. 40: variation b/v in suffix, ar/er in last syllable? | ||
This library should appear soon enough on pip. | ||
|
||
## API | ||
|
||
TODO | ||
|
||
## Changes | ||
|
||
See [CHANGELOG.md](CHANGELOG.md). | ||
|
||
## License | ||
|
||
The Python code is Copyright (C) 2018 Elie Roux, provided under [MIT License](LICENSE). | ||
The Python code is Copyright (C) 2018 Esukhia, provided under [MIT License](LICENSE). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
## TODO | ||
|
||
- study behavior for ambiguous syllables (probably list some as exceptions) | ||
- document kh¨antr¨as | ||
- footnote 200 p. 441 | ||
- dbu 'khyud | ||
- long aspirations (lhod lhod in one big aspiration) | ||
- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother") | ||
- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm) | ||
- test ཡར་འབྲོ/Co,y-a:m|~tr- | ||
- test ཐིག་ལེ | ||
- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos | ||
- option of word separation for exceptions, so that another syllable can be at position 1 | ||
- option for p. 432, note 196, aspirated consonnants on second syllables | ||
- དགོན་པ: p. 442, note 201, do something about it? gø~ pa | ||
- indicate weak pronounciation of n and ng? | ||
- geminates: impact on a/schwa? | ||
- stop after m suffix | ||
- ཀུན་དགའ < test n -> ng | ||
- བར་ཆད -> war cho after another word? | ||
- amdokä: p.39, optional rise from ang to eng | ||
- idem p. 40: variation b/v in suffix, ar/er in last syllable? |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
#! /usr/bin/env python | ||
# -*- coding: utf8 -*- | ||
|
||
from __future__ import print_function | ||
|
||
import os | ||
import sys | ||
from setuptools import setup, find_packages | ||
|
||
|
||
def read(fname): | ||
fname_rst = fname.replace('.md', '.rst') | ||
if os.path.exists(fname_rst): | ||
return open(os.path.join(os.path.dirname(__file__), fname_rst)).read() | ||
else: | ||
try: | ||
import pypandoc | ||
rst = pypandoc.convert(os.path.join(os.path.dirname(__file__), fname), 'rst') | ||
with open(fname_rst, 'w') as f: | ||
f.write(rst) | ||
return rst | ||
except (IOError, ImportError): | ||
return open(os.path.join(os.path.dirname(__file__), fname)).read() | ||
|
||
|
||
setup( | ||
name="bophono", | ||
version="0.1.0", #edit version in __init__.py | ||
author="Esukhia development team", | ||
author_email="[email protected]", | ||
description="Python utils for Tibetan phonetics in different dialects", | ||
license="MIT", | ||
keywords="phonetics ipa tibetan", | ||
url="https://github.com/Esukhia/bophono", | ||
packages=find_packages(), | ||
long_description=read('README.md'), | ||
project_urls={ | ||
'Source': 'https://github.com/Esukhia/bophono', | ||
'Tracker': 'https://github.com/Esukhia/bophono/issues', | ||
}, | ||
classifiers=[ | ||
"Development Status :: 3 - Alpha", | ||
"Topic :: Text Processing :: Linguistic", | ||
"Programming Language :: Python :: 3", | ||
"Operating System :: OS Independent", | ||
"Intended Audience :: Developers", | ||
"Intended Audience :: Science/Research", | ||
"License :: OSI Approved :: MIT License", | ||
"Natural Language :: Tibetan" | ||
], | ||
package_data={'bophono': ['data/*']}, | ||
python_requires='>=3', | ||
) |