diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..9bc7128 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,10 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) +and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). + +## [0.1.0] - 20180425 +### Added + * initial release diff --git a/README.md b/README.md index f8880ae..ea9df44 100644 --- a/README.md +++ b/README.md @@ -1,54 +1,45 @@ # Tibetan Phonetics Engine -The goal of this repository is to: -- provide an engine to interpret Tibetan in various phonetic transcription schemes -- implement the rules in [Tournadre](http://www.worldcat.org/oclc/916715611) (intro and Ann. 2) as a starting point -- implement the Chinese transcription letting a speaker of Mandarin pronounces or chants Tibetan scriptures - ## Description -Ideally the engine will solely use configuration files, so that it can be phonetic scheme agnostic (no phonetics hardcoded). +The goal of this code is to provide a library to: +- implement the conversion of a Tibetan Unicode word into IPA, according to different schemes / dialects +- implement some conversions between IPA and phonetics readable by people with various language backgrounds (Chinese, English, etc.) -The various steps (for the Tournadre scheme, which is the most complex) will be: -- Tibetan unicode -> Phonological scheme (given in Tournadre) -- Phonological scheme -> IPA (according to Annex 2 of Tournadre) -- IPA -> phonetic scheme +The primary focus of this library is litterary pronounciation, ideally representing how an umze would pronounce a traditional text, but contributions for other uses are welcome. We also do not handle Sanskrit transliteration (this can be done through custom exceptions lists). -The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed). +Note that this library integrates no segmenter and needs to be applied on each word separately. You can use it in combination with [pybo](https://github.com/Esukhia/pybo/) to get the phonetics of full sentences. + +## Phonetics methods + +We currently provide two phonetics schemes: + +#### Manual of Standard Tibetan (by Tournadre) + +#### Colloquial Amdo Tibetan (by Kuo-ming Sung and Lha Byams Rgyal) + +## Outputs -In brief, the generated IPA brings the Zhuyin or Bopomofo (Chinese transliteration system for Taiwanese Mandarin), which in turn provides the appropriate and chosen traditional Chinese sinograms. +Apart from raw IPA, we provide the following output possibilities: -We focus exclusively on litterary pronounciation, and have options for reading pronounciation or oral pronounciation. Our focus is to be able express how an umze would pronounce a traditional text. +#### Chinese phonetics + +The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed). + +To produce the final output, we first transform the generated IPA into [Zhuyin](https://en.wikipedia.org/wiki/Bopomofo), and then the Zhuyin into Traditional Chinese characters, with a manually built correspondance list. ## Installation -## Running - -## TODO - -- study behavior for ambiguous syllables (probably list some as exceptions) -- document kh¨antr¨as -- footnote 200 p. 441 -- dbu 'khyud -- implement p. 36 -- long aspirations (lhod lhod in one big aspiration) -- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother") -- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm) -- test ཡར་འབྲོ/Co,y-a:m|~tr- -- test ཐིག་ལེ -- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos -- option of word separation for exceptions, so that another syllable can be at position 1 -- option for p. 432, note 196, aspirated consonnants on second syllables -- དགོན་པ: p. 442, note 201, do something about it? gø~ pa -- indicate weak pronounciation of n and ng? -- geminates: impact on a/schwa? -- stop after m suffix -- phrase accents: -རྔ་སུ་རེད། and -རྔ་ང་རེད། : རྔ pronounced as ་ང -- ཀུན་དགའ < test n -> ng -- བར་ཆད -> war cho after another word? -- amdokä: p.39, optional rise from ang to eng -- idem p. 40: variation b/v in suffix, ar/er in last syllable? +This library should appear soon enough on pip. + +## API + +TODO + +## Changes + +See [CHANGELOG.md](CHANGELOG.md). ## License -The Python code is Copyright (C) 2018 Elie Roux, provided under [MIT License](LICENSE). +The Python code is Copyright (C) 2018 Esukhia, provided under [MIT License](LICENSE). diff --git a/TODO.md b/TODO.md new file mode 100644 index 0000000..c4d1b66 --- /dev/null +++ b/TODO.md @@ -0,0 +1,22 @@ +## TODO + +- study behavior for ambiguous syllables (probably list some as exceptions) +- document kh¨antr¨as +- footnote 200 p. 441 +- dbu 'khyud +- long aspirations (lhod lhod in one big aspiration) +- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother") +- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm) +- test ཡར་འབྲོ/Co,y-a:m|~tr- +- test ཐིག་ལེ +- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos +- option of word separation for exceptions, so that another syllable can be at position 1 +- option for p. 432, note 196, aspirated consonnants on second syllables +- དགོན་པ: p. 442, note 201, do something about it? gø~ pa +- indicate weak pronounciation of n and ng? +- geminates: impact on a/schwa? +- stop after m suffix +- ཀུན་དགའ < test n -> ng +- བར་ཆད -> war cho after another word? +- amdokä: p.39, optional rise from ang to eng +- idem p. 40: variation b/v in suffix, ar/er in last syllable? diff --git a/PhonStateMST.py b/bophono/PhonStateMST.py similarity index 100% rename from PhonStateMST.py rename to bophono/PhonStateMST.py diff --git a/tibpho.py b/bophono/bophono.py similarity index 100% rename from tibpho.py rename to bophono/bophono.py diff --git a/data/README.md b/bophono/data/README.md similarity index 100% rename from data/README.md rename to bophono/data/README.md diff --git a/data/chinese/chinese_trad.csv b/bophono/data/chinese/chinese_trad.csv similarity index 100% rename from data/chinese/chinese_trad.csv rename to bophono/data/chinese/chinese_trad.csv diff --git a/data/chinese/equivalence.csv b/bophono/data/chinese/equivalence.csv similarity index 100% rename from data/chinese/equivalence.csv rename to bophono/data/chinese/equivalence.csv diff --git a/data/chinese/exception.csv b/bophono/data/chinese/exception.csv similarity index 100% rename from data/chinese/exception.csv rename to bophono/data/chinese/exception.csv diff --git a/data/chinese/zhuyin.csv b/bophono/data/chinese/zhuyin.csv similarity index 100% rename from data/chinese/zhuyin.csv rename to bophono/data/chinese/zhuyin.csv diff --git a/data/ends.csv b/bophono/data/ends.csv similarity index 100% rename from data/ends.csv rename to bophono/data/ends.csv diff --git a/data/exceptions.csv b/bophono/data/exceptions.csv similarity index 100% rename from data/exceptions.csv rename to bophono/data/exceptions.csv diff --git a/data/roots.csv b/bophono/data/roots.csv similarity index 100% rename from data/roots.csv rename to bophono/data/roots.csv diff --git a/sdtrie.py b/bophono/sdtrie.py similarity index 100% rename from sdtrie.py rename to bophono/sdtrie.py diff --git a/setup.py b/setup.py new file mode 100644 index 0000000..fac0033 --- /dev/null +++ b/setup.py @@ -0,0 +1,53 @@ +#! /usr/bin/env python +# -*- coding: utf8 -*- + +from __future__ import print_function + +import os +import sys +from setuptools import setup, find_packages + + +def read(fname): + fname_rst = fname.replace('.md', '.rst') + if os.path.exists(fname_rst): + return open(os.path.join(os.path.dirname(__file__), fname_rst)).read() + else: + try: + import pypandoc + rst = pypandoc.convert(os.path.join(os.path.dirname(__file__), fname), 'rst') + with open(fname_rst, 'w') as f: + f.write(rst) + return rst + except (IOError, ImportError): + return open(os.path.join(os.path.dirname(__file__), fname)).read() + + +setup( + name="bophono", + version="0.1.0", #edit version in __init__.py + author="Esukhia development team", + author_email="esukhiadev@gmail.com", + description="Python utils for Tibetan phonetics in different dialects", + license="MIT", + keywords="phonetics ipa tibetan", + url="https://github.com/Esukhia/bophono", + packages=find_packages(), + long_description=read('README.md'), + project_urls={ + 'Source': 'https://github.com/Esukhia/bophono', + 'Tracker': 'https://github.com/Esukhia/bophono/issues', + }, + classifiers=[ + "Development Status :: 3 - Alpha", + "Topic :: Text Processing :: Linguistic", + "Programming Language :: Python :: 3", + "Operating System :: OS Independent", + "Intended Audience :: Developers", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: MIT License", + "Natural Language :: Tibetan" + ], + package_data={'bophono': ['data/*']}, + python_requires='>=3', +)