KEOPS (Keen Evaluation Of Parallel Sentences) provides a complete tool for manual evaluation of parallel sentences.
This is an overview of the application. Please refer to the Evaluator Guide or the Project Manager Guide for detailed information.
If you want to deploy KEOPS, please refer to the Installation Guide.
There are two main types of users in Keops: Users (also known as "Evaluators") and Project Managers. Project Managers can perform management tasks (such as creating tasks, uploading corpora to evaluate, invite users...) and evaluation tasks, while regular Users can only perform evaluation tasks (assigned by the Project Managers).
The most common workflow for a Project Manager is as follows:
First, the Project Manager invites the evaluators to Keops, if they are not Keops users yet. For this, the Project Manager just needs their email addresses. The invitation token generated by Keops will be automatically sent to them.
The invitation token, together with the user email, is what an user needs to create its account.
Then the project manager uploads one (or more) corpus, indicating the source (except for Fluency evaluation tasks) and target languages. Most common EU languages are pre-loaded, but Project Managers can add new ones beforehand, if needed.
After these two first steps, the Project Manager creates a new Project, indicating a name and a description.
When the project is created, the Project Manager can create Tasks in the Project. For this, the Project Manager just needs to indicate the source (except for Fluency evaluation tasks) and target languages, the evaluator and the corpus to be evaluated.
Please note that only users and corpora matching the task languages will be available as choice.
When you assign a task to an evaluator, they will be notified via email.
Once the task is created, Evaluators can immediately start working on them. When an Evaluator finishes evaluating all sentences from a task, and tags it as DONE, both the Project Manager and the evaluator are able to download the stats of the evaluation (a TSV file containing statistics about the task) and the annotated corpus (a TSV file containing the sentences and their evaluation). The Project Manager will be notified via email when an evaluator finishes a task.
The common workflow for an Evaluator is simpler:
After getting an invitation (an invitation token, together with the evaluator's email), the Evaluator signs up and creates an account. It's important that Evaluators include their languages, so Project Managers can assign tasks to them.
If the Evaluator already had an account in Keops, instead of signing up, the Evaluator just needs to sign in.
Once logged in, in their Keops homepage, Evaluators can see a list of tasks they are assigned to. They will also receive an email when they are assigned a task. Evaluators just need to click the start/continue button of the desired task, and then start evaluating.
The evaluation page will change depending on the type of the task which is being performed. KEOPS supports Validation, Ranking and evaluation of Adequacy and Fluency. For more information about this matter, please refer to the Modes section in the Evaluators Guide.
For example, Validation tasks are used to classify pairs of sentences using the European Language Resource Coordination (ELRC) validation guidelines:
Once all sentences of a task are evaluated, the Evaluator is redirected to the recap page for the task, where the evaluation stats of the task are shown. By marking the task as DONE, the Evaluator states that the task is finished, and from that moment, the Project Manager is able to download the stats of the evaluation and the evaluated sentences.
A project can have several tasks (see table "tasks"), each one consisting of several sentences (see table "sentences_tasks").
Sentences belong to a corpus (see table "corpora" and "sentences"), but can be used in several tasks (see table "sentences_tasks")
Sentences are uploaded individually (see table "sentences"), tagged as source or target and then related to each other (see table "sentences_pairing").
A task can only have one user assigned to it (see table "tasks"), but one user can have several tasks assignated.
An user ( project manager or evaluator) can have to several languages associated (see table "user_langs").
An project manager can have several projects (see table "projects"), but each project has only one project manager (owner).
Corpora is always uploaded in TSV format. This format uses one line per record and a tab character to separate fields.
The specific format of the TSV file is explained below for each of the evaluation modes. This information is alsa available on KEOPS clicking on First time uploading corpora. You can also download a template and use it to upload your data:
Please note that all formats contain headers which are necessary in order to upload corpora to KEOPS. If you have a corpus without headers, you can add them using:
echo -e "Header 1\tHeader 2" | cat - corpus > corpus-with-headers
Replace Header 1\tHeader 2
with the actual headers.
Source text | Tab | Target text |
---|---|---|
You can contact our customer service department using the form below | Tab | Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario |
You should only include one target text for each source text.
Source text | Tab | Candidate translation |
---|---|---|
You can contact our customer service department using the form below | Tab | Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario |
You should only include one candidate translation for each source text.
Candidate translation |
---|
Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario |
Corpora for fluency evaluation consist only on one column because they are monolingual tasks.
Source text | Tab | Reference text | Tab | Name of system 1 | Tab | Name of system 2 | Tab | ... |
---|---|---|---|---|---|---|---|---|
You can contact our customer service department using the form below | Tab | Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario | Tab | Manual de empleo y manutención | Tab | Manual de empleo y manutención | Tab | ... |
Include as many systems as you want.
Preloaded languages are: Bulgarian (bg), Czech (cs), Danish (da), German (de), Greek (el), English (en), Spanish (es), Estonian (et), Finnish (fi), French (fr),
Irish (ga), Croatian (hr), Hungarian (hu), Italian (it), Lithuanian (lt), Latvian (lv), Maltese (mt), Norwegian - bokmal (no), Norwegian - nynorsk (nn), Dutch (nl), Polish (pl), Portuguese (pt), Romanian (ro),
Slovak (sk), Slovenian (sl) and Swedish (sv).
But remember: Project Managers can add as many languages as needed, at any time!
Evaluators working with Keops must follow the European Language Resource Coordination (ELRC) validation guidelines.
To ensure consistency from one evaluator to another, the following system has been adopted for grading translations. Evaluators should use the following types/labels to tag problematic cases:
- Wrong language identification
- Incorrect alignment
- Wrong tokenization
- MT translation
- Translation error
- Free translation
For more information on each label, please check the ELRC Validation Guidelines document, section "4.2.2.2 Validation by human experts".
Remember: Evaluators can refer to the Validation Guidelines at any time, just clicking in the link in the Evaluation window.
The Task Summary (or task stats) file is a TSV file containing a summary of the task once it is finished. The format will change depending on the type of the task.
Each line contains the Label code, the Label description and the amount of entries tagged with that label in the task. For example:
Label Description Count
L Wrong language id. 44
A Incorrect alignment 150
T Wrong tokenization 204
MT MT translation 97
E Translation error 70
F Free translation 39
V Valid translation 396
P Pending 0
Total Total 1000
Each line contains a percentage (in steps of 10) and the amount of sentences evaluated with that score. For example:
Percentage # of sentences
0 3
10 2
20 5
30 1
40 1
50 0
60 2
70 4
80 3
90 4
100 1
Each line contains a percentage (in steps of 10) and the amount of sentences evaluated with that score. For example:
Percentage # of sentences
0 0
10 0
20 1
30 0
40 0
50 20
60 0
70 0
80 0
90 0
100 0
Each line contains the name of a system and its position in the ranking. For example:
System Position
Google 7
Microsoft 1
DeepL 6
Apertium 3
PROMPT 3
The Annotated Sentences file is a TSV file containing all sentences that were evaluated in a task, as well as their evaluations and Evaluator comments (if any).
The format will change depending on the type of the task.
Each line contains the source and target texts, the source and target languages, the evaluation, description, evaluation details and the time it took the evaluator to evaluate the sentence. An example of an annotated sentences file follows:
Source | Target | Source lang | Target lang | Evaluation | Description | Evaluation details | Time |
---|---|---|---|---|---|---|---|
Use and maintenance manual | Manual de empleo y mantenimiento | en | es | V | Valid translation | 28.5 | |
Tooling for cup chain aka tennis chain | Herramienta para cadena tennis | en | es | MT | MT translation | 30.1 |
Each line contains the source text, the target text, source and target languages, the score of the sentence, description, evaluation details and the time it took the evaluator to evaluate the sentence.
Source | Target 1 | Source lang | Target lang | Evaluation | Description | Evaluation details | Time |
---|---|---|---|---|---|---|---|
If you did not find what wou were looking for, please use our custom search engine: | Si no encontró lo que está buscando, pruebe nuestro motor de búsqueda personalizada! | en | es | 20 | 20.1 | ||
In this page, you will find information about Guided Tours of Dumbria. | En esta página encontrarás información sobre Visitas Guiadas de Dumbria. | en | es | 70 | 25.0 |
Each line contains the target text, the target language, the score of the sentence, description, evaluation details and the time it took the evaluator to evaluate the sentence.
Target | Target lang | Evaluation | Description | Evaluation details | Time |
---|---|---|---|---|---|
If you did not find what wou were looking for, please use our custom search engine: | en | 20 | 18.5 | ||
In this page, you will find information about Guided Tours of Dumbria. | en | 70 | 21.2 |
Each line contains a source text, a reference text, the candidate translations, the position of each one in JSON format, description, evaluation details and the time it took the evaluator to evaluate the sentence. For example:
Source | Target | Microsoft | DeepL | Apertium | PROMPT | Source lang | Target lang | Evaluation | Description | Evaluation details | Time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Use and maintenance manual | Manual de empleo y manutención | Manual de empleo y manutención | Manual de empleo y manutención. | Manual de empleo y manutención | Manual de empleo y manutención | Manual de empleo y manutención. | en | es | {"Google":"1","Microsoft":"2","DeepL":"3","Apertium":"4","PROMPT":"5"} | 55.8 |