VPO

This is the repository for VPO: Leveraging the Number of Votes in Preference Optimization.

To perform the initial SFT on a model, run a command like:

bash script_train_sft_ufb.sh &

Following SFT, to align a model with VDPO, run a command like:

bash script_train_ufb.sh &

If you are using an existing model for SFT, run a command like:

bash script_train_shp.sh &

After training, to sample output from the model, run a command like:

bash script_eval.sh &

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
scripts		scripts
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
environment.yaml		environment.yaml
eval.py		eval.py
models.py		models.py
train.py		train.py
trainers.py		trainers.py
utils.py		utils.py

Provide feedback