This repository contains the code for our paper "Discovering Preference Optimization Algorithms with and for Large Language Models".
The code for training is largely taken and adapted from huggingface/alignment-handbook.
To run the code in this project, first, create a Python virtual environment using e.g. Conda:
conda create -n handbook python=3.10 && conda activate handbook
Next, install PyTorch v2.1.2
- the precise version is important for reproducibility! Since this is hardware-dependent, we
direct you to the PyTorch Installation Page.
You can then install the remaining package dependencies as follows:
python -m pip install .
You will also need Flash Attention 2 installed, which can be done by running:
python -m pip install flash-attn==2.5.7 --no-build-isolation
Note If your machine has less than 96GB of RAM and many CPU cores, reduce the
MAX_JOBS
arguments, e.g.MAX_JOBS=4 pip install flash-attn==2.5.7 --no-build-isolation
Next, log into your Hugging Face and Wandb accounts as follows:
huggingface-cli login
wandb login
Finally, install Git LFS so that you can push models to the Hugging Face Hub:
sudo apt-get install git-lfs
Then, install FastChat for MT-Bench as follows (in the same directory that you cloned this repo):
cd ../
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip install -e ".[model_worker,llm_judge]"
Make sure that it is loading the correct chat template for Zephyr-Gemma.
See this issue for the template.
To launch the evolution script:
python3 scripts/launch_evo.py --wandb
Finally, you need to install Alpaca Eval 2.0.
Annoyingly, alpaca_eval
uses openai>1.5.0
and mt-bench
uses openai==0.28
, which is not backward compatible. Therefore we need to create a second conda environment, that is a copy of the first.
conda create --name handbook_alpaca --clone handbook
conda activate handbook_alpaca
Subsequently we install alpaca_eval
as follows:
pip install alpaca-eval
I have also created an extra folder in this repo named alpaca_eval
, where we store all the model and api config files
Whenever you want to run an mt-bench
model evaluation, you can do this with the following command:
conda activate handbook
python scripts/run_evaluations.py \
--model-id <name_of_your_model> \
--model-path <path_to_model_weights_or_HF> \
--num-generations 1 \
--mt-bench \
Whenever you want to run an alpaca_eval
model evaluation, you can do this with the following command:
conda activate handbook_alpaca
python scripts/run_evaluations.py \
--model-id <name_of_your_model> \
--num-generations 1 \
--alpaca-eval \
--alpaca-model <path_to_your_model_config>/configs.yaml \
--alpaca-reference-model path_to_ref_model_config>/configs.yaml \
--alpaca-openai-configs <path_to_your_client_config>/openai_configs.yaml
If you want to run both together, We have prepared bash scripts:
source scripts/train_tldr.sh
source scripts/eval_tldr.sh
source scripts/train_eval_imdb.sh
@article{lu2024discopop,
title={Discovering Preference Optimization Algorithms with and for Large Language Models},
author={Lu, Chris and Holt, Samuel and Fanconi, Claudio and Chan, Alex J and Foerster, Jakob and van der Schaar, Mihaela and Lange, Robert Tjarko},
journal={arXiv preprint arXiv:2406.08414},
year={2024}
}