Skip to content

gemunet/coranno

Repository files navigation

CORANNO

Corpus Annotation Tool

This is a tool for Natural Language Processing (NLP), which allows to create fully annotated corpora with Classification/Sentiment and POS/Entities/NER/, portable and reusable.

features:

  • Multiple datasets
  • Multiple Annotations projects
  • Entities Annotations (NER)
  • Classification multilabel and multiclass
  • Possibility to tag a selected text area
  • Custom tags creation
  • Search en dataset
  • Filter dataset for project by tag
  • Split sentences in project by regular expression
  • Views progress and stats
  • Corpus export in simple JSON format
  • Collaborative annotation
  • Ready for docker

Based on doccano https://github.com/chakki-works/doccano

Prerequisites

  • python3
  • Google Chrome(highly recommended)

Installing

pip install -r requirements.txt

Getting Started

  1. Run server

    python3 run.py 8000

  2. Go to page

    http://localhost:8000/

  3. Enter login credentials:

    • user: admin
    • pass: admin

Create Dataset

Now upload a dataset by click on "Create Dataset", complete the form and "create".

Alt text

Go to dataset by clicking on the name. Select upload mode and upload files.

Modes:

  • TXT: each line should contain a text sentence.
  • JSON: each line should contain a json object with at least one key 'text', which contains a text. can have an key "file", with the name of the file
  • PLAIN: one or more documents with plain text

Alt text

Create Project

Now create a tagging project that uses the previously created data set. To do this, click on "create project"

complete the form data

Alt text

Open the new project by clicking on the name of this.

First you must create the labels, for that click on "Edit data" in the top bar. Create your label, set a name, color and a shortcut key and go back to "annotate data".

start your annotations por classification or entities :)

Classification:

Alt text

Entities:

Alt text

Corpus Export

export full annotated corpus in simple JSON format, go to "Dataset" in top bar and open the dataset.

In dataset left menu, click in "Export", select annotation projects to export and click in "Download JSON file".

All documents with annotations will be exported.

example format

{
   "projects":[
      {
         "name":"News classification",
         "description":"news type classification",
         "split_pattern":"",
         "split_type":"split",
         "project_type":"DocumentClassification",
         "annotations":[
            {
               "label":"politics",
               "doc_id":1,
               "start":94,
               "end":316
            }
         ]
      },
      {
         "name":"News entities",
         "description":"news entities classification",
         "split_pattern":"",
         "split_type":"split",
         "project_type":"SequenceLabeling",
         "annotations":[
            {
               "label":"PERSON",
               "doc_id":1,
               "start":25,
               "end":47
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":85,
               "end":89
            },
            {
               "label":"DATE",
               "doc_id":1,
               "start":320,
               "end":329
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":348,
               "end":353
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":368,
               "end":398
            },
            {
               "label":"PERSON",
               "doc_id":1,
               "start":403,
               "end":415
            },
            {
               "label":"ORG",
               "doc_id":1,
               "start":520,
               "end":534
            }
         ]
      }
   ],
   "docs":[
      {
         "doc_id":1,
         "file":"new001.txt",
         "dataset":"news",
         "text":"In her video above, the Olympian Allyson Felix tells her story around pregnancy and Nike.\r\n\r\nIve always known that expressing myself could hurt my career. Ive tried not to show emotion, to anticipate what people expect from me and to do it. I dont like to let people down. But you cant change anything with silence.\r\n\r\nLast week, two of my former Nike teammates, the Olympian runners Alysia Montao and Kara Goucher, heroically broke their nondisclosure agreements with the company to share their pregnancy stories in a New York Times investigation.\r\n\r\nThey told stories we athletes know are true, but have been too scared to tell publicly: If we have children, we risk pay cuts from our sponsors during pregnancy and afterward. Its one example of a sports industry where the rules are still mostly made for and by men.\r\n"
      }
   ]
}

About

Corpus Annotation Tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published