Labeled data is a group of samples that have been tagged with one or more labels. A labeling system typically takes a set of unlabeled data and augments each piece of it with informative tags. After obtaining a labeled dataset, machine learning models can be applied to the data so that new unlabeled data can be presented to the model and a likely label can be guessed or predicted for that piece of unlabeled data.
The aim of this project is to design and implement a data labeling system in objectoriented manner. The data labeling system has multiple labeling mechanisms. These are; random, machine learning, simple search, user interface, and sentence labeling mechanism. In our program, instances of the dataset will be labeled based on the labeling mechanism provided by the user. There is an authentication mechanism for human users too.
Actors: User, Data Labeling System
Precondition: User must provide input files (config.json, dataset.json, machine learning data)
- User starts the system.
- System selects the dataset which determined by config.json.
- System parses dataset.json and constructs the dataset.
- System asks for user name and password.
- User leaves user name and password blank.
- System determines the corresponding data labeling mechanism based on the user type.
- Bots start labeling instances one by one.
- System outputs the labeled dataset to output.json.
- System calculates and outputs performance metrics to metrics.json.
Actors: User, Data Labeling System
Precondition: User must provide input files (config.json, dataset.json)
- User starts the system.
- System selects the dataset which determined by config.json.
- System parses dataset.json and constructs the dataset.
- System asks for user name and password.
- If user name and password do not match any credentials in config.json system should prompt the user to enter again.
- System determines the corresponding data labeling mechanism based on the user type.
- System outputs the labeled dataset to output.json.
- System outputs performance metrics to metrics.json.