KEOPS (Keen Evaluator Of Parallel Sentences)

KEOPS (Keen Evaluation Of Parallel Sentences) provides a complete tool for manual evaluation of parallel sentences.

This is an overview of the application. Please refer to the Evaluator Guide or the Project Manager Guide for detailed information.

If you want to deploy KEOPS, please refer to the Installation Guide.

Overview

There are two main types of users in Keops: Users (also known as "Evaluators") and Project Managers. Project Managers can perform management tasks (such as creating tasks, uploading corpora to evaluate, invite users...) and evaluation tasks, while regular Users can only perform evaluation tasks (assigned by the Project Managers).

Project Managers

The most common workflow for a Project Manager is as follows:

First, the Project Manager invites the evaluators to Keops, if they are not Keops users yet. For this, the Project Manager just needs their email addresses. The invitation token generated by Keops will be automatically sent to them.

The invitation token, together with the user email, is what an user needs to create its account.

Then the project manager uploads one (or more) corpus, indicating the source (except for Fluency evaluation tasks) and target languages. Most common EU languages are pre-loaded, but Project Managers can add new ones beforehand, if needed.

After these two first steps, the Project Manager creates a new Project, indicating a name and a description.

When the project is created, the Project Manager can create Tasks in the Project. For this, the Project Manager just needs to indicate the source (except for Fluency evaluation tasks) and target languages, the evaluator and the corpus to be evaluated.

Please note that only users and corpora matching the task languages will be available as choice.

When you assign a task to an evaluator, they will be notified via email.

Once the task is created, Evaluators can immediately start working on them. When an Evaluator finishes evaluating all sentences from a task, and tags it as DONE, both the Project Manager and the evaluator are able to download the stats of the evaluation (a TSV file containing statistics about the task) and the annotated corpus (a TSV file containing the sentences and their evaluation). The Project Manager will be notified via email when an evaluator finishes a task.

Evaluators

The common workflow for an Evaluator is simpler:

After getting an invitation (an invitation token, together with the evaluator's email), the Evaluator signs up and creates an account. It's important that Evaluators include their languages, so Project Managers can assign tasks to them.

If the Evaluator already had an account in Keops, instead of signing up, the Evaluator just needs to sign in.

Once logged in, in their Keops homepage, Evaluators can see a list of tasks they are assigned to. They will also receive an email when they are assigned a task. Evaluators just need to click the start/continue button of the desired task, and then start evaluating.

The evaluation page will change depending on the type of the task which is being performed. KEOPS supports Validation, Ranking and evaluation of Adequacy and Fluency. For more information about this matter, please refer to the Modes section in the Evaluators Guide.

For example, Validation tasks are used to classify pairs of sentences using the European Language Resource Coordination (ELRC) validation guidelines:

Once all sentences of a task are evaluated, the Evaluator is redirected to the recap page for the task, where the evaluation stats of the task are shown. By marking the task as DONE, the Evaluator states that the task is finished, and from that moment, the Project Manager is able to download the stats of the evaluation and the evaluated sentences.

DB Schema

A project can have several tasks (see table "tasks"), each one consisting of several sentences (see table "sentences_tasks").

Sentences belong to a corpus (see table "corpora" and "sentences"), but can be used in several tasks (see table "sentences_tasks")

Sentences are uploaded individually (see table "sentences"), tagged as source or target and then related to each other (see table "sentences_pairing").

A task can only have one user assigned to it (see table "tasks"), but one user can have several tasks assignated.

An user ( project manager or evaluator) can have to several languages associated (see table "user_langs").

An project manager can have several projects (see table "projects"), but each project has only one project manager (owner).

FAQ: Frequently Asked Questions

Which format must corpora have to be properly uploaded to Keops?

Corpora is always uploaded in TSV format. This format uses one line per record and a tab character to separate fields.

The specific format of the TSV file is explained below for each of the evaluation modes. This information is alsa available on KEOPS clicking on First time uploading corpora. You can also download a template and use it to upload your data:

Please note that all formats contain headers which are necessary in order to upload corpora to KEOPS. If you have a corpus without headers, you can add them using:

echo -e "Header 1\tHeader 2" | cat - corpus > corpus-with-headers

Replace Header 1\tHeader 2 with the actual headers.

Corpora for validation

Source text	Tab	Target text
You can contact our customer service department using the form below	Tab	Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario

You should only include one target text for each source text.

Corpora for adequacy

Source text	Tab	Candidate translation
You can contact our customer service department using the form below	Tab	Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario

You should only include one candidate translation for each source text.

Corpora for fluency

Candidate translation
Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario

Corpora for fluency evaluation consist only on one column because they are monolingual tasks.

Corpora for ranking

Source text	Tab	Reference text	Tab	Name of system 1	Tab	Name of system 2	Tab	...
You can contact our customer service department using the form below	Tab	Puedes ponerte en contacto con nuestro departamento de servicio al cliente mediante el siguiente formulario	Tab	Manual de empleo y manutención	Tab	Manual de empleo y manutención	Tab	...

Include as many systems as you want.

Which languages are preloaded in Keops?

Preloaded languages are: Bulgarian (bg), Czech (cs), Danish (da), German (de), Greek (el), English (en), Spanish (es), Estonian (et), Finnish (fi), French (fr),
Irish (ga), Croatian (hr), Hungarian (hu), Italian (it), Lithuanian (lt), Latvian (lv), Maltese (mt), Norwegian - bokmal (no), Norwegian - nynorsk (nn), Dutch (nl), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl) and Swedish (sv).

But remember: Project Managers can add as many languages as needed, at any time!

What are "validation guidelines"?

Evaluators working with Keops must follow the European Language Resource Coordination (ELRC) validation guidelines.

To ensure consistency from one evaluator to another, the following system has been adopted for grading translations. Evaluators should use the following types/labels to tag problematic cases:

Wrong language identification
Incorrect alignment
Wrong tokenization
MT translation
Translation error
Free translation

For more information on each label, please check the ELRC Validation Guidelines document, section "4.2.2.2 Validation by human experts".

Remember: Evaluators can refer to the Validation Guidelines at any time, just clicking in the link in the Evaluation window.

Which format does the "task summary" file have?

The Task Summary (or task stats) file is a TSV file containing a summary of the task once it is finished. The format will change depending on the type of the task.

Validation

Each line contains the Label code, the Label description and the amount of entries tagged with that label in the task. For example:

Label 	Description           Count
L     	Wrong language id. 	  44
A     	Incorrect alignment   150
T     	Wrong tokenization    204
MT    	MT translation        97
E     	Translation error     70
F     	Free translation      39
V     	Valid translation     396
P     	Pending               0
Total 	Total                 1000

Adequacy

Each line contains a percentage (in steps of 10) and the amount of sentences evaluated with that score. For example:

Percentage 	# of sentences
0          	3
10         	2
20         	5
30         	1
40         	1
50         	0
60         	2
70         	4
80         	3
90         	4
100        	1

Fluency

Each line contains a percentage (in steps of 10) and the amount of sentences evaluated with that score. For example:

Percentage 	# of sentences
0          	0
10         	0
20         	1
30         	0
40         	0
50         	20
60         	0
70         	0
80         	0
90         	0
100        	0

Ranking

Each line contains the name of a system and its position in the ranking. For example:

System        Position
Google        7
Microsoft     1
DeepL         6
Apertium      3
PROMPT        3

Which format does the "annotated sentences" file have?

The Annotated Sentences file is a TSV file containing all sentences that were evaluated in a task, as well as their evaluations and Evaluator comments (if any).

The format will change depending on the type of the task.

Validation

Each line contains the source and target texts, the source and target languages, the evaluation, description, evaluation details and the time it took the evaluator to evaluate the sentence. An example of an annotated sentences file follows:

Source	Target	Source lang	Target lang	Evaluation	Description	Evaluation details	Time
Use and maintenance manual	Manual de empleo y mantenimiento	en	es	V	Valid translation		28.5
Tooling for cup chain aka tennis chain	Herramienta para cadena tennis	en	es	MT	MT translation		30.1

Adequacy

Each line contains the source text, the target text, source and target languages, the score of the sentence, description, evaluation details and the time it took the evaluator to evaluate the sentence.

Source	Target 1	Source lang	Target lang	Evaluation	Description	Evaluation details	Time
If you did not find what wou were looking for, please use our custom search engine:	Si no encontró lo que está buscando, pruebe nuestro motor de búsqueda personalizada!	en	es	20			20.1
In this page, you will find information about Guided Tours of Dumbria.	En esta página encontrarás información sobre Visitas Guiadas de Dumbria.	en	es	70			25.0

Fluency

Each line contains the target text, the target language, the score of the sentence, description, evaluation details and the time it took the evaluator to evaluate the sentence.

Target	Target lang	Evaluation	Description	Evaluation details	Time
If you did not find what wou were looking for, please use our custom search engine:	en	20			18.5
In this page, you will find information about Guided Tours of Dumbria.	en	70			21.2

Ranking

Each line contains a source text, a reference text, the candidate translations, the position of each one in JSON format, description, evaluation details and the time it took the evaluator to evaluate the sentence. For example:

Source	Target	Google	Microsoft	DeepL	Apertium	PROMPT	Source lang	Target lang	Evaluation	Description	Evaluation details	Time
Use and maintenance manual	Manual de empleo y manutención	Manual de empleo y manutención	Manual de empleo y manutención.	Manual de empleo y manutención	Manual de empleo y manutención	Manual de empleo y manutención.	en	es	{"Google":"1","Microsoft":"2","DeepL":"3","Apertium":"4","PROMPT":"5"}			55.8

Name		Name	Last commit message	Last commit date
Latest commit History 582 Commits
admin		admin
automigration		automigration
corpora		corpora
css		css
dao		dao
dto		dto
img		img
js		js
languages		languages
migration		migration
projects		projects
resources		resources
screenshots		screenshots
sentences		sentences
services		services
tasks		tasks
users		users
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile-db		Dockerfile-db
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
config.php		config.php
configure-db.sh		configure-db.sh
configure-keops.sh		configure-keops.sh
contact.php		contact.php
db-readme.txt		db-readme.txt
docker-compose.yml		docker-compose.yml
docker-entrypoint-db.sh		docker-entrypoint-db.sh
docker-entrypoint.sh		docker-entrypoint.sh
error.php		error.php
evaluators.md		evaluators.md
favicon.ico		favicon.ico
forgot_password.php		forgot_password.php
index.php		index.php
keops.conf		keops.conf
keops.png		keops.png
keopsdb_init.sql		keopsdb_init.sql
pm.md		pm.md
renew_password.php		renew_password.php
set-password.sh		set-password.sh
signin.php		signin.php
signup.php		signup.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KEOPS (Keen Evaluator Of Parallel Sentences)

Table of Contents

Overview

Project Managers

Evaluators

DB Schema

FAQ: Frequently Asked Questions

Which format must corpora have to be properly uploaded to Keops?

Corpora for validation

Corpora for adequacy

Corpora for fluency

Corpora for ranking

Which languages are preloaded in Keops?

What are "validation guidelines"?

Which format does the "task summary" file have?

Validation

Adequacy

Fluency

Ranking

Which format does the "annotated sentences" file have?

Validation

Adequacy

Fluency

Ranking

About

Releases

Packages

Contributors 5

Languages

License

paracrawl/keops

Folders and files

Latest commit

History

Repository files navigation

KEOPS (Keen Evaluator Of Parallel Sentences)

Table of Contents

Overview

Project Managers

Evaluators

DB Schema

FAQ: Frequently Asked Questions

Which format must corpora have to be properly uploaded to Keops?

Corpora for validation

Corpora for adequacy

Corpora for fluency

Corpora for ranking

Which languages are preloaded in Keops?

What are "validation guidelines"?

Which format does the "task summary" file have?

Validation

Adequacy

Fluency

Ranking

Which format does the "annotated sentences" file have?

Validation

Adequacy

Fluency

Ranking

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages