A pip installable version of RDRPOSTagger with Tibetan-specific changes.
- See the original RDRPOSTagger for documentation.
- Check the modifications implemented in this repo.
- See rdr-data for RDR models for Tibetan.
- See usage.py for the programmatic interface available in bordr
Build the source dist:
rm -rf dist/
python3 setup.py clean sdist
and upload on twine (version >= 1.11.0
) with:
twine upload dist/*
The SDICT content passed to generate INIT file is changed. The words in SDICT are given U(Unique tag from bilou tagging system) tag as those words are segmented as Unique token by botok. With that changed SDICT content, we will get INIT file based on botok segmentation. Hence rules generated will be able to resolve botok segmentation ambiguity.