Natural language processing projects based on the one-million-posts dataset.
- Install pyenv.
- Install python 3.8.5 via
pyenv install 3.8.5
- Run
make setup
.
See SETUP.md.
The notebooks are pushed as .py
files in the python percentage script format (we like meaningful diffs).
These files have been created via the jupyter plugin jupytext which will automatically get installed if you execute make setup
as part of the basic setup.
To get the actual notebook experience open them via jupyter. But even without jupytext you can run them just like any python file via python -m file_name.py
.
The presentations are found in ./presentations/
Presentation file | Description |
---|---|
One Million Posts - Annotation composition.pdf | EDA concerning ticket #24, #25 |
The models' code is found in ./modeling/
in this repo.
They are pushed as .py
files. See Setup - Modeling
Model | Description |
---|---|
Zero Shot Classifier | |
Support Vector classifier | |
RandomForest Classifier | |
Naive Bayes | Naive Bayes classifier |
The notebooks are currently found in the main folder of this repo.
They are pushed as .py
files. See Setup - Notebooks.