nerblackbox adheres to Semantic Versioning.
- AnnotationTool.download() returns file path (#16)
- Upgraded dependencies (#15)
- Support for additional model architectures: RoBERTa, DeBERTa
- Documentation: reproduction of results
- Renamed class: Experiment -> Training
- Renamed training parameters: prune_ratio_train -> train_fraction (+ same for val & test)
- annotation tool integration (Doccano and LabelStudio)
- demonstration notebooks
- restructured docs
- reduced CLI to two commands (nerblackbox mlflow & nerblackbox tensorboard)
- dropped support for python version 3.11
- upgraded dependencies (fixing potential security vulnerabilities)
- Model: prediction on file
- Model: evaluation of any model on any dataset
- API: complete renewal using classes Store, Dataset, Experiment, Model
- Supported python versions: 3.8 to 3.11
- Dataset: no shuffling by default
- Model: base model with NER classification head can be loaded
- NerModelPredict: GPU batch inference
- TextEncoder class for custom data preprocessing
- HuggingFace datasets integration: enable subsets
- HuggingFace datasets: support for sucx_ner
- NerModelPredict: improved inference time and data post-processing
- API: load best model of experiment directly (instead of via ExperimentResults)
- upgrade pytorch-lightning
- Adaptive fine-tuning
- Integration of HuggingFace Datasets
- Integration of raw (unpretokenized) data
- Integration of different annotation schemes and seamless conversion between them
- Option to specify experiments dynamically (instead of using a config file)
- Option to add special tokens
- New built-in dataset: Swe-NERC
- Use seeds for reproducibility
- Validation only on single metric (e.g. loss) during training
- Shuffling of all datasets (train, val, test)
- Results: epochs start counting from 1 instead of 0
- Results: compute standard version of macro-average, plus number of contributing classes
- Results: add precision and recall
- All models that are based on WordPiece tokenizer work
- Early stopping: use last model instead of stopped epoch model
- NerModelPredict: predict on token or entity level
- Evaluation entity level: compute metrics for single labels
- Evaluation token level: confusion matrix
- Evaluation token & entity level: number of predicted classes
- Evaluation token level: use plain annotation scheme
- Migrate to pytorch-lightning==1.3.7, seqeval==1.2.2, mlflow==1.8.0
- special [NEWLINE] token can be used in input data
- CLI command "predict_proba"
- long input samples are automatically sliced before sent to model
- NerModelPredict: unknown tokens are restored (in external mode)
- NerModelPredict for local pretrained models
- NerModelEvaluation
- Swedish datasets (SIC & SUC 3.0)
- Python 3.9 support
- CLI command "get_experiments_results"
- CLI command "nerbb download" (and corresponding python method) to download built-in datasets
- CLI command "nerbb init" (and corresponding python method) no longer automatically downloads built-in datasets
- NerModelPredict: option to predict probabilities instead of tags
- Exposure of main python classes at top level of package
- Renamed LightningNerModelPredict -> NerModelPredict
- Renamed LightningNerModelTrain -> NerModelTrain
- use of local pretrained models
- loading of NerModelPredict from checkpoint
- New CLI command "nerbb clear_data" to clear checkpoints and results
- Dependencies cleaned up and simplified
- Experiment configuration file "exp_test.ini" improved
- Boolean CLI options "--verbose" and "--fp16"
First open-sourced version.