GitHub

Named Entity Recognition for GUM

This dataset contains release versions of the Georgetown University Multilayer Corpus (GUM), a corpus of English texts from twelve written and spoken text types.The corpus is created as part of the course LING-367 (Computational Corpus Linguistics) at Georgetown University. Thus, our aim is to use two different kind of classifiers in order to accomplish the NER task.

1. Problem Statement

Classify correctly the 23 classes

2. Data Description

Data is obtained from this repo.

Number of instances - 44111 entries (Train), 18236 entries (Test)
Number of classes - 2
Attribute Information

Inputs
- token: string feature
Output
- ner_tag : a classification label , 23 classes

3. Topic Modelling

The Topics are analyzed via two methods:

Latent Dirichlet Allocation (LDA)
Negative Matrix Factorization (NMF)

4. EDA

5. Modelling Evaluation

Algorithms used
- BI-LSTM
- BERT
Metrics used: Accuracy, Precision,Recall, F1-Score

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
NER_GUM.ipynb		NER_GUM.ipynb
README.md		README.md
importing.py		importing.py
model.py		model.py
plot.py		plot.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition for GUM

Table of Contents

Name Entity Recognition using Neural Networks and Transformers Approach

1. Problem Statement

2. Data Description

Attribute Information

Inputs

Output

3. Topic Modelling

4. EDA

5. Modelling Evaluation

6. Results

About

Releases

Packages

Languages

gabrielecola/NER

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition for GUM

Table of Contents

Name Entity Recognition using Neural Networks and Transformers Approach

1. Problem Statement

2. Data Description

Attribute Information

Inputs

Output

3. Topic Modelling

4. EDA

5. Modelling Evaluation

6. Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages