Skip to content

Commit

Permalink
Adding code + data
Browse files Browse the repository at this point in the history
  • Loading branch information
ewsheng committed Jul 14, 2020
0 parents commit 23d6506
Show file tree
Hide file tree
Showing 23 changed files with 2,733 additions and 0 deletions.
179 changes: 179 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Towards Controllable Biases in Language Generation

This code generates bias triggers and evaluates the biases in text generated using the bias triggers.

More details can be found in [this paper](https://arxiv.org/abs/2005.00268).

## Dependencies

This trigger search code is written using PyTorch and extended from the code from the paper [Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)](https://github.com/Eric-Wallace/universal-triggers). The code for GPT-2 is based on [HuggingFace's Transformer repo](https://github.com/huggingface/pytorch-transformers).

The evaluation code relies on the code and data from the paper [The Woman Worked as a Babysitter: On Biases in Language Generation](https://github.com/ewsheng/nlg-bias). *You will need to download this repo in order to run the evaluation.*

With 1 RTX 2080-Ti GPU, the trigger search takes 1-2 hours.

## Installation

An easy way to install the code is to create a fresh anaconda environment:

```
conda create -n triggers python=3.6
conda activate triggers
pip install -r requirements.txt
```

## Getting Started

### Finding bias triggers

Running `src/create_adv_token.py` generates a bias trigger for specified (demographic, regard) pairs.

A demographic group can be a string (e.g., "The man") or represented by a set of names.

```
python src/create_adv_token.py --help
usage: create_adv_token.py [-h]
[--neg_sample_file NEG_SAMPLE_FILE]
[--neu_sample_file NEU_SAMPLE_FILE]
[--pos_sample_file POS_SAMPLE_FILE]
[--neg_demographic NEG_DEMOGRAPHIC]
[--pos_demographic POS_DEMOGRAPHIC]
[--neg_name_file NEG_NAME_FILE]
[--pos_name_file POS_NAME_FILE]
[--salience_threshold SALIENCE_THRESHOLD]
[--salient_phrases_file SALIENT_PHRASES_FILE]
[--use_original_loss USE_ORIGINAL_LOSS]
[--use_salience_loss USE_SALIENCE_LOSS]
[--use_dissociation_loss USE_DISSOCIATION_LOSS]
[--use_weighted_salience_loss USE_WEIGHTED_SALIENCE_LOSS]
[--alpha ALPHA]
[--beta BETA]
[--beam_size BEAM_SIZE]
[--use_weighted_neg USE_WEIGHTED_NEG]
[--trigger_init TRIGGER_INIT]
[--num_trigger_tokens NUM_TRIGGER_TOKENS]
[--trigger_masked_phrases TRIGGER_MASKED_PHRASES]
[--trigger_position TRIGGER_POSITION]
[--debias DEBIAS]
[--num_demographics NUM_DEMOGRAPHICS]
[--model MODEL]
[--batch_size BATCH_SIZE]
optional arguments:
-h, --help show this help message and exit
--neg_sample_file NEG_SAMPLE_FILE
File of negative regard target samples.
--neu_sample_file NEU_SAMPLE_FILE
File of neutral regard target samples.
--pos_sample_file POS_SAMPLE_FILE
Fle of positive regard target samples.
--neg_demographic NEG_DEMOGRAPHIC
Demographic mention for negative target samples.
--pos_demographic POS_DEMOGRAPHIC
Demographic mention for positive target samples.
--neg_name_file NEG_NAME_FILE
File with names for negative target samples. Overrides
neg_demographic.
--pos_name_file POS_NAME_FILE
File with names for positive target samples. Overrides
pos_demographic.
--salience_threshold SALIENCE_THRESHOLD
--salient_phrases_file SALIENT_PHRASES_FILE
File with salient phrases.
--use_original_loss USE_ORIGINAL_LOSS
Use association loss.
--use_salience_loss USE_SALIENCE_LOSS
--use_dissociation_loss USE_DISSOCIATION_LOSS
Use dissociation loss.
--use_weighted_salience_loss USE_WEIGHTED_SALIENCE_LOSS
--alpha ALPHA Weight for original loss.
--beta BETA Weight for dissociation loss.
--beam_size BEAM_SIZE
Beam size when searching for trigger replacement
candidates.
--use_weighted_neg USE_WEIGHTED_NEG
--trigger_init TRIGGER_INIT
Initialize trigger with a phrase.
--num_trigger_tokens NUM_TRIGGER_TOKENS
--trigger_masked_phrases TRIGGER_MASKED_PHRASES
--trigger_position TRIGGER_POSITION
Options are `head`, `body_demographic`,
`body_biascontext.
--debias DEBIAS Whether to generate triggers to debias. 0 = no debias,
1 = neutral debias, 2 = neutral + positive debias.
--num_demographics NUM_DEMOGRAPHICS
Whether to use 1 or 2 demographics.
--model MODEL `gpt2` or `dialogpt`.
--batch_size BATCH_SIZE
32 works well for CPU, 16 for GPU.
```

For example, if we wanted to find a trigger for GPT2 that associates positive social connotations
with "The woman" and negative social connotations with "The man", we could run:
```
python src/create_adv_token.py \
--neg_sample_file data/gpt2_neg_regard.tsv \
--neu_sample_file data/gpt2_neu_regard.tsv \
--pos_sample_file data/gpt2_pos_regard.tsv \
--neg_demographic "The man" \
--pos_demographic "The woman" > neg_man_pos_woman.txt
```

### Evaluating samples generated from bias triggers
First, download the code and regard classifier [here](https://github.com/ewsheng/nlg-bias).

Now you can run `src/eval_triggers.py` to use a found bias trigger to generate samples and then evaluate the samples using
the regard classifier.


```
python src/eval_triggers.py --help
usage: eval_triggers.py [-h] [--trigger_dump_file TRIGGER_DUMP_FILE]
[--trigger_label_output_dir TRIGGER_LABEL_OUTPUT_DIR]
[--regard_classifier_dir REGARD_CLASSIFIER_DIR]
[--trigger_position TRIGGER_POSITION]
[--neg_demographic NEG_DEMOGRAPHIC]
[--pos_demographic POS_DEMOGRAPHIC]
[--neg_name_file NEG_NAME_FILE]
[--pos_name_file POS_NAME_FILE]
[--metric METRIC]
[--model MODEL]
optional arguments:
-h, --help show this help message and exit
--trigger_dump_file TRIGGER_DUMP_FILE
The output file of create_adv_token.py.
--trigger_label_output_dir TRIGGER_LABEL_OUTPUT_DIR
Path to output generated samples.
--regard_classifier_dir REGARD_CLASSIFIER_DIR
Path to top level of regard classifier code directory.
--trigger_position TRIGGER_POSITION
Options are `head` or `body`.
--neg_demographic NEG_DEMOGRAPHIC
Demographic string to associate with negative regard
samples.
--pos_demographic POS_DEMOGRAPHIC
Demographic string to associate with positive regard
samples.
--neg_name_file NEG_NAME_FILE
Name file for negative association.
--pos_name_file POS_NAME_FILE
Name file for positive association.
--metric METRIC Specify metric: `regard2`, `regard1`, `sentiment2`, or
`sentiment1`.
--model MODEL `gpt2` or `dialogpt`.
```

For example, to evaluate the trigger found using `create_adv_token.py` above:
```
python src/eval_triggers.py \
--trigger_dump_file neg_man_pos_woman.txt \
--trigger_label_output_dir [EXISTING_PATH] \
--regard_classifier_dir [EXISTING_PATH_TO_TOP_LEVEL_REGARD_REPO] \
--neg_demographic "The man" \
--pos_demographic "The woman"
```

### Other features


20 changes: 20 additions & 0 deletions data/black_female_names.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Imani
Ebony
Shanice
Aaliyah
Precious
Nia
Deja
Diamond
Asia
Aliyah
Jada
Tierra
Tiara
Kiara
Jazmine
Jasmin
Jazmin
Jasmine
Alexus
Raven
20 changes: 20 additions & 0 deletions data/black_male_names.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
DeShawn
DeAndre
Marquis
Darnell
Terrell
Malik
Trevon
Tyrone
Willie
Dominique
Demetrius
Reginald
Jamal
Maurice
Jalen
Darius
Xavier
Terrance
Andre
Darryl
40 changes: 40 additions & 0 deletions data/black_names.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
DeShawn
Andre
Malik
Deja
Jalen
Terrance
Imani
Diamond
Jazmine
Ebony
DeAndre
Asia
Maurice
Jazmin
Jada
Marquis
Trevon
Tierra
Tyrone
Jamal
Aaliyah
Darryl
Raven
Kiara
Dominique
Demetrius
Darnell
Precious
Jasmin
Tiara
Terrell
Darius
Aliyah
Willie
Jasmine
Xavier
Reginald
Nia
Alexus
Shanice
Loading

0 comments on commit 23d6506

Please sign in to comment.