CORANNO

Corpus Annotation Tool

This is a tool for Natural Language Processing (NLP), which allows to create fully annotated corpora with Classification/Sentiment and POS/Entities/NER/, portable and reusable.

features:

Multiple datasets
Multiple Annotations projects
Entities Annotations (NER)
Classification multilabel and multiclass
Possibility to tag a selected text area
Custom tags creation
Search en dataset
Filter dataset for project by tag
Split sentences in project by regular expression
Views progress and stats
Corpus export in simple JSON format
Collaborative annotation
Ready for docker

Based on doccano https://github.com/chakki-works/doccano

Prerequisites

python3
Google Chrome(highly recommended)

Installing

pip install -r requirements.txt

Getting Started

Run server

python3 run.py 8000
Go to page

http://localhost:8000/
Enter login credentials:
- user: admin
- pass: admin

Create Dataset

Now upload a dataset by click on "Create Dataset", complete the form and "create".

Go to dataset by clicking on the name. Select upload mode and upload files.

Modes:

TXT: each line should contain a text sentence.
JSON: each line should contain a json object with at least one key 'text', which contains a text. can have an key "file", with the name of the file
PLAIN: one or more documents with plain text

Create Project

Now create a tagging project that uses the previously created data set. To do this, click on "create project"

complete the form data

Open the new project by clicking on the name of this.

First you must create the labels, for that click on "Edit data" in the top bar. Create your label, set a name, color and a shortcut key and go back to "annotate data".

start your annotations por classification or entities :)

Classification:

Entities:

Corpus Export

export full annotated corpus in simple JSON format, go to "Dataset" in top bar and open the dataset.

In dataset left menu, click in "Export", select annotation projects to export and click in "Download JSON file".

All documents with annotations will be exported.

example format

{
   "projects":[
      {
         "name":"News classification",
         "description":"news type classification",
         "split_pattern":"",
         "split_type":"split",
         "project_type":"DocumentClassification",
         "annotations":[
            {
               "label":"politics",
               "doc_id":1,
               "start":94,
               "end":316
            }
         ]
      },
      {
         "name":"News entities",
         "description":"news entities classification",
         "split_pattern":"",
         "split_type":"split",
         "project_type":"SequenceLabeling",
         "annotations":[
            {
               "label":"PERSON",
               "doc_id":1,
               "start":25,
               "end":47
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":85,
               "end":89
            },
            {
               "label":"DATE",
               "doc_id":1,
               "start":320,
               "end":329
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":348,
               "end":353
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":368,
               "end":398
            },
            {
               "label":"PERSON",
               "doc_id":1,
               "start":403,
               "end":415
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":520,
               "end":534
            }
         ]
      }
   ],
   "docs":[
      {
         "doc_id":1,
         "file":"new001.txt",
         "dataset":"news",
         "text":"In her video above, the Olympian Allyson Felix tells her story around pregnancy and Nike.\r\n\r\nIve always known that expressing myself could hurt my career. Ive tried not to show emotion, to anticipate what people expect from me and to do it. I dont like to let people down. But you cant change anything with silence.\r\n\r\nLast week, two of my former Nike teammates, the Olympian runners Alysia Montao and Kara Goucher, heroically broke their nondisclosure agreements with the company to share their pregnancy stories in a New York Times investigation.\r\n\r\nThey told stories we athletes know are true, but have been too scared to tell publicly: If we have children, we risk pay cuts from our sponsors during pregnancy and afterward. Its one example of a sports industry where the rules are still mostly made for and by men.\r\n"
      }
   ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
coranno		coranno
data		data
dataset		dataset
doc		doc
docs		docs
project		project
static		static
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
buildandrun.sh		buildandrun.sh
manage.py		manage.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CORANNO

Prerequisites

Installing