cleanup, reorganize, doc

Esukhia · May 6, 2018 · cf75ff3 · cf75ff3
1 parent eae9b23
commit cf75ff3
Show file tree

Hide file tree

Showing 15 changed files with 116 additions and 40 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,10 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
+and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
+
+## [0.1.0] - 20180425
+### Added
+ * initial release
diff --git a/README.md b/README.md
@@ -1,54 +1,45 @@
 # Tibetan Phonetics Engine
 
-The goal of this repository is to:
-- provide an engine to interpret Tibetan in various phonetic transcription schemes
-- implement the rules in [Tournadre](http://www.worldcat.org/oclc/916715611) (intro and Ann. 2) as a starting point
-- implement the Chinese transcription letting a speaker of Mandarin pronounces or chants Tibetan scriptures
-
 ## Description
 
-Ideally the engine will solely use configuration files, so that it can be phonetic scheme agnostic (no phonetics hardcoded).
+The goal of this code is to provide a library to:
+- implement the conversion of a Tibetan Unicode word into IPA, according to different schemes / dialects
+- implement some conversions between IPA and phonetics readable by people with various language backgrounds (Chinese, English, etc.)
 
-The various steps (for the Tournadre scheme, which is the most complex) will be:
-- Tibetan unicode -> Phonological scheme (given in Tournadre)
-- Phonological scheme -> IPA (according to Annex 2 of Tournadre)
-- IPA -> phonetic scheme
+The primary focus of this library is litterary pronounciation, ideally representing how an umze would pronounce a traditional text, but contributions for other uses are welcome. We also do not handle Sanskrit transliteration (this can be done through custom exceptions lists).
 
-The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed).
+Note that this library integrates no segmenter and needs to be applied on each word separately. You can use it in combination with [pybo](https://github.com/Esukhia/pybo/) to get the phonetics of full sentences.
+
+## Phonetics methods
+
+We currently provide two phonetics schemes:
+
+#### Manual of Standard Tibetan (by Tournadre)
+
+#### Colloquial Amdo Tibetan (by Kuo-ming Sung and Lha Byams Rgyal)
+
+## Outputs
 
-In brief, the generated IPA brings the Zhuyin or Bopomofo (Chinese transliteration system for Taiwanese Mandarin), which in turn provides the appropriate and chosen traditional Chinese sinograms.
+Apart from raw IPA, we provide the following output possibilities:
 
-We focus exclusively on litterary pronounciation, and have options for reading pronounciation or oral pronounciation. Our focus is to be able express how an umze would pronounce a traditional text.
+#### Chinese phonetics
+
+The Chinese is produced by a streamlined phonetic scheme in order to match the Mandarin phonology (vowels have been simplified and most of the Tibetan suffixes removed).
+
+To produce the final output, we first transform the generated IPA into [Zhuyin](https://en.wikipedia.org/wiki/Bopomofo), and then the Zhuyin into Traditional Chinese characters, with a manually built correspondance list.
 
 ## Installation
 
-## Running
-
-## TODO
-
-- study behavior for ambiguous syllables (probably list some as exceptions)
-- document kh¨antr¨as
-- footnote 200 p. 441
-- dbu 'khyud
-- implement p. 36
-- long aspirations (lhod lhod in one big aspiration)
-- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother")
-- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm)
-- test ཡར་འབྲོ/Co,y-a:m|~tr-
-- test ཐིག་ལེ
-- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos
-- option of word separation for exceptions, so that another syllable can be at position 1
-- option for p. 432, note 196, aspirated consonnants on second syllables
-- དགོན་པ: p. 442, note 201, do something about it? gø~ pa
-- indicate weak pronounciation of n and ng?
-- geminates: impact on a/schwa?
-- stop after m suffix
-- phrase accents: -རྔ་སུ་རེད། and -རྔ་ང་རེད། : རྔ pronounced as ་ང
-- ཀུན་དགའ < test n -> ng
-- བར་ཆད -> war cho after another word?
-- amdokä: p.39, optional rise from ang to eng
-- idem p. 40: variation b/v in suffix, ar/er in last syllable?
+This library should appear soon enough on pip.
+
+## API
+
+TODO
+
+## Changes
+
+See [CHANGELOG.md](CHANGELOG.md).
 
 ## License
 
-The Python code is Copyright (C) 2018 Elie Roux, provided under [MIT License](LICENSE).
+The Python code is Copyright (C) 2018 Esukhia, provided under [MIT License](LICENSE).
diff --git a/TODO.md b/TODO.md
@@ -0,0 +1,22 @@
+## TODO
+
+- study behavior for ambiguous syllables (probably list some as exceptions)
+- document kh¨antr¨as
+- footnote 200 p. 441
+- dbu 'khyud
+- long aspirations (lhod lhod in one big aspiration)
+- high tone ma when it's negation (ma mthong : "doesn't see" or "sees the mother")
+- add : after vowels in case of second suffix? (khams -> kʰâːm, kham -> kʰàm)
+- test ཡར་འབྲོ/Co,y-a:m|~tr-
+- test ཐིག་ལེ
+- indicate ambiguity: ཤ་འབྲས = sh+am|tr-ä' or sha|tr-ä' according to pos
+- option of word separation for exceptions, so that another syllable can be at position 1
+- option for p. 432, note 196, aspirated consonnants on second syllables
+- དགོན་པ: p. 442, note 201, do something about it? gø~ pa
+- indicate weak pronounciation of n and ng?
+- geminates: impact on a/schwa?
+- stop after m suffix
+- ཀུན་དགའ < test n -> ng
+- བར་ཆད -> war cho after another word?
+- amdokä: p.39, optional rise from ang to eng
+- idem p. 40: variation b/v in suffix, ar/er in last syllable?
diff --git a/PhonStateMST.py → bophono/PhonStateMST.py b/PhonStateMST.py → bophono/PhonStateMST.py
diff --git a/tibpho.py → bophono/bophono.py b/tibpho.py → bophono/bophono.py
diff --git a/data/README.md → bophono/data/README.md b/data/README.md → bophono/data/README.md
diff --git a/data/chinese/chinese_trad.csv → bophono/data/chinese/chinese_trad.csv b/data/chinese/chinese_trad.csv → bophono/data/chinese/chinese_trad.csv
diff --git a/data/chinese/equivalence.csv → bophono/data/chinese/equivalence.csv b/data/chinese/equivalence.csv → bophono/data/chinese/equivalence.csv
diff --git a/data/chinese/exception.csv → bophono/data/chinese/exception.csv b/data/chinese/exception.csv → bophono/data/chinese/exception.csv
diff --git a/data/chinese/zhuyin.csv → bophono/data/chinese/zhuyin.csv b/data/chinese/zhuyin.csv → bophono/data/chinese/zhuyin.csv
diff --git a/data/ends.csv → bophono/data/ends.csv b/data/ends.csv → bophono/data/ends.csv
diff --git a/data/exceptions.csv → bophono/data/exceptions.csv b/data/exceptions.csv → bophono/data/exceptions.csv
diff --git a/data/roots.csv → bophono/data/roots.csv b/data/roots.csv → bophono/data/roots.csv
diff --git a/sdtrie.py → bophono/sdtrie.py b/sdtrie.py → bophono/sdtrie.py
diff --git a/setup.py b/setup.py
@@ -0,0 +1,53 @@
+#! /usr/bin/env python
+# -*- coding: utf8 -*-
+
+from __future__ import print_function
+
+import os
+import sys
+from setuptools import setup, find_packages
+
+
+def read(fname):
+    fname_rst = fname.replace('.md', '.rst')
+    if os.path.exists(fname_rst):
+        return open(os.path.join(os.path.dirname(__file__), fname_rst)).read()
+    else:
+        try:
+            import pypandoc
+            rst = pypandoc.convert(os.path.join(os.path.dirname(__file__), fname), 'rst')
+            with open(fname_rst, 'w') as f:
+                f.write(rst)
+            return rst
+        except (IOError, ImportError):
+            return open(os.path.join(os.path.dirname(__file__), fname)).read()
+
+
+setup(
+    name="bophono",
+    version="0.1.0",  #edit version in __init__.py
+    author="Esukhia development team",
+    author_email="[email protected]",
+    description="Python utils for Tibetan phonetics in different dialects",
+    license="MIT",
+    keywords="phonetics ipa tibetan",
+    url="https://github.com/Esukhia/bophono",
+    packages=find_packages(),
+    long_description=read('README.md'),
+    project_urls={
+        'Source': 'https://github.com/Esukhia/bophono',
+        'Tracker': 'https://github.com/Esukhia/bophono/issues',
+    },
+    classifiers=[
+        "Development Status :: 3 - Alpha",
+        "Topic :: Text Processing :: Linguistic",
+        "Programming Language :: Python :: 3",
+        "Operating System :: OS Independent",
+        "Intended Audience :: Developers",
+        "Intended Audience :: Science/Research",
+        "License :: OSI Approved :: MIT License",
+        "Natural Language :: Tibetan"
+    ],
+    package_data={'bophono': ['data/*']},
+    python_requires='>=3',
+)