This repository implements SaGE: Evaluating Moral Consistency in Large Language Models by Vamshi B, Sreeram V, Priyanshul G, PK and Manas Gaur accepted at LREC-COLING 2024.
- Input is a list of questions and the model
- Paraphrases for each question are produced
- Model output is generated for each paraphrase
- RoTs are generated for each question, output pair
- Generated data is passed to SaGE
- Metrics are returned
The SAGE library is designed to evaluate the consistency of generative models. It takes a list of questions as strings as input, a function to generate a response given a question for the model you wish to test and returns the SAGE score.
You can use pip install -e .
to install local pip libraries.
score(questions, get_response, use_rots=True)
import openai
import time
from sage import sage
openai.api_key = "{OPENAI_API_KEY}"
def get_gpt_response(question, model_name="gpt-3.5-turbo", temperature=0.8):
prompt = f"""Answer the following question in one paragraph, be concise.
Question: {question}"""
for i in range(5): # 5 attempts with exponential backoff
try:
response = openai.ChatCompletion.create(
model=model_name,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temperature,
)
return response['choices'][0]['message']['content'].strip()
except openai.error.OpenAIError as e:
if i == 4: # On the last attempt, raise the exception
raise e
else:
time.sleep((2 ** i) + 1) # Exponential backoff
questions = ['What makes us human?']
# with rots
results = sage.score(questions, get_gpt_response, use_rots=True)
print(results)
# without rots
results = sage.score(questions, get_gpt_response, use_rots=False)
print(results)
This repo only contains the main dataset, to find the data required to reproduce results, please read data/README.md
.
data/
├── README.md
└── mcc
├── mcc.csv
└── mcc_moral.csv
README.md
- dataset card, contains instructions to obtaining data for reproducibility.mcc.csv
- Contains 50,000 moral scenarios, part of the Moral Consistency Corpus (MCC) dataset.mcc_moral.csv
- Contains the moral categories for each question in MCC.
scripts/
├── edge_generation.py
├── hella_swag
│ └── pipeline.py
├── model_output.py
├── pair_generation.py
├── para_generation.py
├── pipeline.py
├── rot_generation.py
└── t_analysis
├── tedge_generation.py
└── tpair_generation.py
edge_generation.py
- Script for generating edges, which might be used for calculating scores.hella_swag/pipeline.py
- Specialized pipeline script for the "hella_swag" dataset.model_output.py
- Script for generating model outputs for input questions.pair_generation.py
- Script for generating (questions, model output) pairs.para_generation.py
- Script for generating paraphrases for questions.pipeline.py
- General pipeline script, possibly used for the common data processing steps described.rot_generation.py
- Script for generating RoTs for model output.t_analysis/tedge_generation.py
- Script for generating temperature-specific edges.t_analysis/tpair_generation.py
- Script for generating temperature-specific pairs of data.