End-to-end Parser for Eastern Armenian

This repository contains necessary tools to parse raw Eastern Armenian text. It has a script, run.sh, which takes raw text as an input and produces a CoNLL-U file with lemmas, morphological features, part-of-speech tags and dependency trees.

The parser segments the text into sentences and tokenizes them using ArmTreeBank's Tokenizer module.
Lemmatization, POS tagging and dependency parsing is performed by a neural network called COMBO, which is developed and open-sourced by Piotr Rybak and Alina Wroblewska from Institute of Computer Science, Polish Academy of Sciences. If you use this network, please cite their paper.
We have trained COMBO on the training set of the ArmTDP treebank from UD v2.3.
The accuracy of the parser is far from perfect. It has been trained only on ~500 sentences. The table below shows the accuracy on the test set of the same treebank.

Metric	Accuracy
Lemmatization	88.05%
Part-of-speech tagging	85.07%
Morphological features	70.21%
Dependency parsing (Labelled attachment score)	55.25%

Visualization of the current parser

The model is hosted on DigitalOcean: https://parser.yerevann.com/

Instructions (for End to end parsing)

Make sure you have all the requirements installed

pip install -r requirements.txt

Clone the repo (to get the submodules don't forget to include the --recursive flag)

git clone --recursive https://github.com/Armtreebank/End-to-end-Parser.git

Run the following command to get the .conllu file with predictions for every sentence of the input

python3 predict.py --model_path path_to_model.pkl --input_path sample.txt --output_path sample.conllu

Instructions (for COMBO training)

cd COMBO
python3 -m src.main --mode autotrain --train train_data_path.conllu --valid valid_data_path.conllu --model model.pkl --force_trees

Acknowledgements

This project is supported by ANSEF grant Lingu-5008 and ISTC Research Grant.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
COMBO		COMBO
Tokenizer		Tokenizer
static		static
.ebignore		.ebignore
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
application.py		application.py
model.pkl		model.pkl
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-end Parser for Eastern Armenian

Visualization of the current parser

Instructions (for End to end parsing)

Instructions (for COMBO training)

Acknowledgements

About

Releases

Packages

Languages

Armtreebank/End-to-end-Parser

Folders and files

Latest commit

History

Repository files navigation

End-to-end Parser for Eastern Armenian

Visualization of the current parser

Instructions (for End to end parsing)

Instructions (for COMBO training)

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages