Skip to content

Commit

Permalink
cleanup, reorganize, doc
Browse files Browse the repository at this point in the history
  • Loading branch information
eroux committed May 6, 2018
1 parent eae9b23 commit cf75ff3
Show file tree
Hide file tree
Showing 15 changed files with 116 additions and 40 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [0.1.0] - 20180425
### Added
* initial release
71 changes: 31 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,45 @@
# Tibetan Phonetics Engine

The goal of this repository is to:
- provide an engine to interpret Tibetan in various phonetic transcription schemes
- implement the rules in [Tournadre](http://www.worldcat.org/oclc/916715611) (intro and Ann. 2) as a starting point
- implement the Chinese transcription letting a speaker of Mandarin pronounces or chants Tibetan scriptures

## Description

Ideally the engine will solely use configuration files, so that it can be phonetic scheme agnostic (no phonetics hardcoded).
The goal of this code is to provide a library to:
- implement the conversion of a Tibetan Unicode word into IPA, according to different schemes / dialects
- implement some conversions between IPA and phonetics readable by people with various language backgrounds (Chinese, English, etc.)

The various steps (for the Tournadre scheme, which is the most complex) will be:
- Tibetan unicode -> Phonological scheme (given in Tournadre)
- Phonological scheme -> IPA (according to Annex 2 of Tournadre)
- IPA -> phonetic scheme
The primary focus of this library is litterary pronounciation, ideally representing how an umze would pronounce a traditional text, but contributions for other uses are welcome. We also do not handle Sanskrit transliteration (this can be done through custom exceptions lists).

The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed).
Note that this library integrates no segmenter and needs to be applied on each word separately. You can use it in combination with [pybo](https://github.com/Esukhia/pybo/) to get the phonetics of full sentences.

## Phonetics methods

We currently provide two phonetics schemes:

#### Manual of Standard Tibetan (by Tournadre)

#### Colloquial Amdo Tibetan (by Kuo-ming Sung and Lha Byams Rgyal)

## Outputs

In brief, the generated IPA brings the Zhuyin or Bopomofo (Chinese transliteration system for Taiwanese Mandarin), which in turn provides the appropriate and chosen traditional Chinese sinograms.
Apart from raw IPA, we provide the following output possibilities:

We focus exclusively on litterary pronounciation, and have options for reading pronounciation or oral pronounciation. Our focus is to be able express how an umze would pronounce a traditional text.
#### Chinese phonetics

The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed).

To produce the final output, we first transform the generated IPA into [Zhuyin](https://en.wikipedia.org/wiki/Bopomofo), and then the Zhuyin into Traditional Chinese characters, with a manually built correspondance list.

## Installation

## Running

## TODO

- study behavior for ambiguous syllables (probably list some as exceptions)
- document kh¨antr¨as
- footnote 200 p. 441
- dbu 'khyud
- implement p. 36
- long aspirations (lhod lhod in one big aspiration)
- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother")
- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm)
- test ཡར་འབྲོ/Co,y-a:m|~tr-
- test ཐིག་ལེ
- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos
- option of word separation for exceptions, so that another syllable can be at position 1
- option for p. 432, note 196, aspirated consonnants on second syllables
- དགོན་པ: p. 442, note 201, do something about it? gø~ pa
- indicate weak pronounciation of n and ng?
- geminates: impact on a/schwa?
- stop after m suffix
- phrase accents: -རྔ་སུ་རེད། and -རྔ་ང་རེད། : རྔ pronounced as ་ང
- ཀུན་དགའ < test n -> ng
- བར་ཆད -> war cho after another word?
- amdokä: p.39, optional rise from ang to eng
- idem p. 40: variation b/v in suffix, ar/er in last syllable?
This library should appear soon enough on pip.

## API

TODO

## Changes

See [CHANGELOG.md](CHANGELOG.md).

## License

The Python code is Copyright (C) 2018 Elie Roux, provided under [MIT License](LICENSE).
The Python code is Copyright (C) 2018 Esukhia, provided under [MIT License](LICENSE).
22 changes: 22 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## TODO

- study behavior for ambiguous syllables (probably list some as exceptions)
- document kh¨antr¨as
- footnote 200 p. 441
- dbu 'khyud
- long aspirations (lhod lhod in one big aspiration)
- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother")
- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm)
- test ཡར་འབྲོ/Co,y-a:m|~tr-
- test ཐིག་ལེ
- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos
- option of word separation for exceptions, so that another syllable can be at position 1
- option for p. 432, note 196, aspirated consonnants on second syllables
- དགོན་པ: p. 442, note 201, do something about it? gø~ pa
- indicate weak pronounciation of n and ng?
- geminates: impact on a/schwa?
- stop after m suffix
- ཀུན་དགའ < test n -> ng
- བར་ཆད -> war cho after another word?
- amdokä: p.39, optional rise from ang to eng
- idem p. 40: variation b/v in suffix, ar/er in last syllable?
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
53 changes: 53 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#! /usr/bin/env python
# -*- coding: utf8 -*-

from __future__ import print_function

import os
import sys
from setuptools import setup, find_packages


def read(fname):
fname_rst = fname.replace('.md', '.rst')
if os.path.exists(fname_rst):
return open(os.path.join(os.path.dirname(__file__), fname_rst)).read()
else:
try:
import pypandoc
rst = pypandoc.convert(os.path.join(os.path.dirname(__file__), fname), 'rst')
with open(fname_rst, 'w') as f:
f.write(rst)
return rst
except (IOError, ImportError):
return open(os.path.join(os.path.dirname(__file__), fname)).read()


setup(
name="bophono",
version="0.1.0", #edit version in __init__.py
author="Esukhia development team",
author_email="[email protected]",
description="Python utils for Tibetan phonetics in different dialects",
license="MIT",
keywords="phonetics ipa tibetan",
url="https://github.com/Esukhia/bophono",
packages=find_packages(),
long_description=read('README.md'),
project_urls={
'Source': 'https://github.com/Esukhia/bophono',
'Tracker': 'https://github.com/Esukhia/bophono/issues',
},
classifiers=[
"Development Status :: 3 - Alpha",
"Topic :: Text Processing :: Linguistic",
"Programming Language :: Python :: 3",
"Operating System :: OS Independent",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: MIT License",
"Natural Language :: Tibetan"
],
package_data={'bophono': ['data/*']},
python_requires='>=3',
)

0 comments on commit cf75ff3

Please sign in to comment.