System Combination via Quality Estimation for Grammatical Error Correction

This repository provides the code to easily score, re-rank, and combine corrections from Grammatical Error Correction (GEC) models, as reported in this paper:

System Combination via Quality Estimation for Grammatical Error Correction
Muhammad Reza Qorib and Hwee Tou Ng
The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (PDF)

Installation

Please install the necessary libraries by running the following commands:

pip install -e requirements.txt
wget -P models https://sterling8.d2.comp.nus.edu.sg/~reza/GRECO/checkpoint.bin
wget https://www.comp.nus.edu.sg/~nlp/sw/m2scorer.tar.gz
tar -xf m2scorer.tar.gz

Please check whether the installed PyTorch matches your hardware CUDA version.

To also run other quality estimation models, please run the following commands:

git clone https://github.com/nusnlp/neuqe
git clone https://github.com/thunlp/VERNet
git clone https://github.com/kokeman/SOME

And download the model checkpoints from

https://github.com/nusnlp/neuqe to checkpoints/neuqe folder.
https://github.com/thunlp/VERNet/ to checkpoints/vernet folder.
https://github.com/kokeman/SOME to checkpoints/some folder.

Quality Estimation

Scoring hypotheses in your code

You can import the GRECO class from models.py, instantiate the class, and pass the source(s) and hypotheses (in the form of python list of strings) to the .score() function.

import torch
from models import GRECO

model = GRECO('microsoft/deberta-v3-large').to(device)
model.load_state_dict(torch.load('models/checkpoint.bin))
model.score(source, hyphoteses)

Correlation coefficient

Get the scores on all text by running this command. In this example, we will also score the text with SOME.

python score_all.py --auto --data_dir data/conll-official/texts --output_path outputs/greco_scores.json --model greco --lm_model microsoft/deberta-v3-large --checkpoint models/checkpoint.bin --source_file data/conll-source.txt --batch_size 16
python score_all.py --auto --data_dir data/conll-official/texts --output_path outputs/some_scores.json --model some --source_file data/conll-source.txt --batch_size 16

Get the gold F0.5 score for each sentence by running this command.

python m2_for_corr.py --data_dir data/conll-official/reports --scorer m2scorer --output_path outputs/target.json

Calculate the correlation by running this command

python correlation.py --system_A outputs/greco_scores.json --system_B outputs/some_scores.json --target outputs/target.json --metric spearman

Re-ranking

Reproducing re-ranking F0.5 score

Run the following to re-rank the corrections

python rerank.py --data_dir data/conll-official/texts --source_file data/conll-source.txt --auto --output_path outputs/greco_rerank.out --model greco --lm_model microsoft/deberta-v3-large --checkpoint models/checkpoint.bin --batch_size 16

Run the following to get the F0.5 score

python2 m2scorer/scripts/m2scorer.py outputs/greco_rerank.out data/conll-2014.m2

Re-ranking your top-k model outputs

You can run the same command as above but change the data path in the --data_dir argument. For all k, print the k-th best correction for each source sentence into a single file inside a folder, and pass that folder path to the --data_dir argument. The code will read all files inside that folder. You can check the data/conll-official/texts as an example.