Skip to content

Latest commit

 

History

History
24 lines (13 loc) · 559 Bytes

README.md

File metadata and controls

24 lines (13 loc) · 559 Bytes

VPO

This is the repository for VPO: Leveraging the Number of Votes in Preference Optimization.


To perform the initial SFT on a model, run a command like:

bash script_train_sft_ufb.sh &

Following SFT, to align a model with VDPO, run a command like:

bash script_train_ufb.sh &

If you are using an existing model for SFT, run a command like:

bash script_train_shp.sh &

After training, to sample output from the model, run a command like:

bash script_eval.sh &


We modified the code based on https://github.com/ContextualAI/HALOs.