This repository contains the source code for my Master's thesis Towards Fairness in NLP: Neural Methods for Flavor Detection and Bias Mitigation.
This folder contains the evaluation results described in Chapter 5: Results and Evaluation. The human evaluation forms and the analysis sheet are located here. The Perspective API evaluation results and scripts are located here. You need an API key to run the scripts.
This folder contains MODULAR and CONCURRENT described in Chapter 4: Experimental Setup based on the paper Automatically Neutralizing Subjective Bias in Text. You can find the original repository here.
- Set up your environment:
$ virtualenv -p python3 .venv-nb
$ source .venv-nb/bin/activate
$ pip install -r req-nb.txt
$ python
>> import nltk; nltk.download("punkt")
-
Download the Wiki Neutrality Corpus (WNC) data here. Extract it to the data folder.
-
Download the MODULAR checkpoint here and save it to the model folder or train your own model using this script. Please contact me if you need a checkpoint for CONCURRENT. You can train your own model using this script.
-
Use the model interface or the inference scripts for inference.
This folder contains STRAP described in Chapter 4: Experimental Setup based on the paper Reformulating Unsupervised Style Transfer as Paraphrase Generation. You can find the original repository here.
- Set up your environment:
$ virtualenv -p python3 .venv-stp
$ source .venv-stp/bin/activate
$ pip install transformers
$ pip install torch==1.6.0+cu92 torchvision==0.7.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install -r req-stp.txt
-
Please contact me if you need the preprocessed WNC data.
-
You can download the Diverse Paraphraser (paraphraser_gpt2_large) here. Save it to the models folder. Please contact me if you need a checkpoint for the Inverse Paraphraser trained on WNC. If you want to train it yourself, please follow the steps described here.
-
You can use the command line-based interface to interact with the model. It is documented here.
-
Evaluation requires the SIM model (Wieting et al., 2019). You can download it here and save it to this folder.
- Set up your environment:
$ virtualenv -p python3.10 .venv-pg
$ source .venv-pg/bin/activate
$ pip install -r req-pg.txt
-
Download pretrained GloVe word embeddings and extract the
.zip
file. Saveglove.6B.50d.txt
to the data folder. -
Follow the descriptions in the notebooks.