-
Notifications
You must be signed in to change notification settings - Fork 233
AmuNMT for Automatic Post Editing
This page provides data and model files for our shared task winning APE system described in Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing. If you use any of the data, systems or ideas, please cite:
@InProceedings{junczysdowmunt-grundkiewicz:2016:WMT,
author = {Junczys-Dowmunt, Marcin and Grundkiewicz, Roman},
title = {Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing},
booktitle = {Proceedings of the First Conference on Machine Translation},
month = {August},
year = {2016},
address = {Berlin, Germany},
publisher = {Association for Computational Linguistics},
pages = {751--758},
url = {http://www.aclweb.org/anthology/W16-2378}
}
Download the training data (514M)
This file contains the artificially generated post-editing triplets described in Table 1 of the paper. "4M" is the larger set denoted as "round-trip.n10" in that table, 500K is the smaller set denoted as "round-trip.n1". The 20 times oversampled original training data for the shared task is not included, but can be obtained from the original shared task page.
data
├── 4M
│ ├── 4M.mt
│ ├── 4M.pe
│ └── 4M.src
└── 500K
├── 500K.mt
├── 500K.pe
└── 500K.src
Download the systems (2.7G)
We also provide the complete primary system and two contrastive variants. To create the submitted output, locate the Makefile
and provide the path to the main directory of your working AmuNMT tool (latest master, see Readme) in the following line:
AMUNMT=/home/marcinj/Badania/amunmt
Next type make
. The included files should provide all input files, model files and scripts to produce our exact submission. In the end you should see the three submission files:
AMU_ensemble8-mt+src_PRIMARY
AMU_ensemble4-mt_CONTRASTIVE
AMU_ensemble4-src_CONTRASTIVE
In the future we will provide more hints on how to train a similar system. Currently we supply the following files:
system
├── data
│ ├── de.bpe
│ ├── en.bpe
│ ├── true.de
│ └── true.en
├── Makefile
├── models
│ ├── configs
│ │ ├── mt-pe.ensemble4.tuned.yml
│ │ ├── mtsrc-pe.ensemble.ape.tuned.yml
│ │ └── src-pe.ensemble4.yml
│ ├── mt-pe
│ │ ├── model.iter260000.npz
│ │ ├── model.iter270000.npz
│ │ ├── model.iter280000.npz
│ │ ├── model.iter290000.npz
│ │ ├── vocab.mt.json
│ │ └── vocab.pe.json
│ └── src-pe
│ ├── model.iter340000.npz
│ ├── model.iter350000.npz
│ ├── model.iter360000.npz
│ ├── model.iter370000.npz
│ ├── vocab.pe.json
│ └── vocab.src.json
├── scripts
│ ├── apply_bpe.py
│ ├── deescape-special-chars.perl
│ ├── detruecase.perl
│ ├── escape-special-chars.perl
│ ├── prepare_submission.py
│ └── truecase.perl
└── test
├── test.mt
└── test.src
where data
contains truecasing models and BPE codes. models/configs
provides the configuration files for amun
to load the model ensembles located in mt-pe
(monolingual model, trained on MT-output and post-editing data) and src-pe
(bilingual model, trained on source and post-editing data).