Data annotation has become an integral part of any research project. Unavailability of a proper open-source tool to perform crowdsourcing for data annotation tasks and can be considered as a challenge which is faced by research students.
The main objective of this study is to develop a web-based system where users (crowdsourced) of the system can participate in the data annotation.
Data Annotation : Data annotation is the task of labeling data. Eg : Given an image, the user annotates whether the image is a picture of a cat or a picture of a dog.
1. Task Authors 2. AnnotatorsThe functionalities of the system include the following features,
Admin - Admin users can upload data annotation or generation tasks. In data annotation task, the admin can :
- Upload a description on the task
- Upload the data to be annotated (text or images)
- Provide the names of classes for which the data should be classified.
- Add a test to validate the ability of the data annotators.
- Approve a user as suitable for the annotation task.
- Provide how many users should annotate each data instance.
- Dynamically add more data with the time.
- Check the progress.
Annotator -
- Add personal details
- Annotate or generate data.
- Check his/her annotation history.
Special Features of the system:
- Handling concurrency issues (to prevent data instance from being annotated more than the required number of users)
- Limit the number of times a user annotates a particular data.
- Mechanisms to calculate inter-annotator agreement of annotated data (kappa score, percent agreement, correlations etc.)
- Other useful statistics (the class with the highest annotation, the type of data which has been mostly generated)
- Dynamic user interfaces that can automatically adjust according to the type of the data (image/text) and the number of classes.
- Proper authentication and user validation mechanisms.