This repository contains the datasets and material for the Shared Task on coreference resolution to be held as a part of CORBON workshop at EACL 2017. The description of the shared task can be found here: http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/
This repository contains:
-
raw parallel data (data/raw): English-German and English-Russian News-Commentary11 raw sentence-aligned corpus (Tiedemann, 2012), split into documents and tokenised by EuroParl tools (Koehn, 2005).
-
coreference-resolved English data (data/training): English part of News-Commentary11 corpus coref-resolved by Berkeley Entity Resolution System (coref-predict mode) (Durrett and Klein, 2013).
-
annotation guidelines (parallel_annotation_guidelines.pdf): parallel pronominal coreference annotation guidelines as described in (Grishina and Stede, 2015).
-
sample annotation (data/sample_annotation): sample files annotated according to the guidelines in (3).
-
test data for German and Russian (data/test).