This repository demonstrates a basic example of the architecture design of the core algorithms of a Chinese IME (Input Method Editor).
zhongwen
to zhong'wen
zhong'wen
to [ "中文", "仲文" ]
- Chinese Pinyin dictionary
Next-word prediction refers to predicting the most probable next-words (candidates) according to the user input context.
- N-gram: Peter F Brown, Vincent J Della Pietra, Peter V Desouza, Jennifer C Lai, and Robert L Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics 18, 4 (1992), 467–480.
- Transformers: pretrained models, a GitHub repo
- 清源 CPM
- jieba
- sego
Emoji prediction refers to predicting the most probable emojis (candidates) associated with the user input context.
- W. Ma, R. Liu, L. Wang, and S. Vosoughi, Emoji prediction: Extensions and benchmarking, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020.
- S. Ramaswamy, R. Mathews, K. Rao, and F. Beaufays, Federated learning for emoji prediction in a mobile keyboard, arXiv preprint arXiv:1906.04329, 2019.