Skip to content

lima-tango/one-million-posts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

one-million-posts

Natural language processing projects based on the one-million-posts dataset.

Setup

  1. Install pyenv.
  2. Install python 3.8.5 via pyenv install 3.8.5
  3. Run make setup.

Setup - Modeling

See SETUP.md.

Setup - Notebooks

The notebooks are pushed as .py files in the python percentage script format (we like meaningful diffs).
These files have been created via the jupyter plugin jupytext which will automatically get installed if you execute make setup as part of the basic setup. To get the actual notebook experience open them via jupyter. But even without jupytext you can run them just like any python file via python -m file_name.py.

Presentations

The presentations are found in ./presentations/

Presentation file Description
One Million Posts - Annotation composition.pdf EDA concerning ticket #24, #25

Modeling

The models' code is found in ./modeling/ in this repo. They are pushed as .py files. See Setup - Modeling

Model Description
Zero Shot Classifier
Support Vector classifier
RandomForest Classifier
Naive Bayes Naive Bayes classifier

Data analysis

The notebooks are currently found in the main folder of this repo. They are pushed as .py files. See Setup - Notebooks.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Other 0.5%