- Named Entity Recognition using Neural Neworks and Transformers Approach
- 1. Problem Statement
- 2. Data Description
- 3. Topic Modelling
- 4. EDA
- 5. Modelling Evaluation
- 6. Results
This dataset contains release versions of the Georgetown University Multilayer Corpus (GUM), a corpus of English texts from twelve written and spoken text types.The corpus is created as part of the course LING-367 (Computational Corpus Linguistics) at Georgetown University. Thus, our aim is to use two different kind of classifiers in order to accomplish the NER task.
Classify correctly the 23 classes
Data is obtained from this repo.
- Number of instances - 44111 entries (Train), 18236 entries (Test)
- Number of classes - 2
- token: string feature
- ner_tag : a classification label , 23 classes
The Topics are analyzed via two methods:
- Latent Dirichlet Allocation (LDA)
- Negative Matrix Factorization (NMF)
- Algorithms used
- BI-LSTM
- BERT
- Metrics used: Accuracy, Precision,Recall, F1-Score