Any support for peft methods on PPO? #159

HaochenZhao · 2025-01-29T09:29:47Z

I am working on a project about applying RL to LLM but only have very limited resource. Hope verl team can support peft methods like lora on your ppo trainer

Jiayi-Pan · 2025-01-30T01:33:24Z

Hi we are working on it, we'll try to send a PR in 2 weeks

vermouth1992 · 2025-01-30T15:44:45Z

Hi we are working on it, we'll try to send a PR in 2 weeks

Shall we have a discussion on how this should be implemented?

PeterSH6 · 2025-01-30T16:02:29Z

I think we can organize a quick Zoom meeting to discuss this. cc: @Jiayi-Pan

corbt · 2025-01-31T23:38:32Z

I'd also be interested in joining the meeting if it will happen in English. 🙂

vermouth1992 · 2025-02-03T02:58:29Z

I'd also be interested in joining the meeting if it will happen in English. 🙂

Hi @corbt, we haven't started yet. But here is my draft. The implementation is pretty straight-forward

We first apply peft to the hf model
merge adaptor to the base model: https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.tuners.tuners_utils.BaseTuner.merge_adapter
bind weights to vllm and generate
unmerge adaptor https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.tuners.tuners_utils.BaseTuner.unmerge_adapter
This can be implemented by introducing a peft sharding manager

vermouth1992 · 2025-02-03T03:00:15Z

Another optimization that we can do is to use the base model for reference policy. This can also be done easily by using disable_adaptors API. https://huggingface.co/docs/transformers/main/en/peft

In this way, we only keep one model for actor/rollout/reference policy. The memory consumption of GRPO is identical to DPO

Jiayi-Pan · 2025-02-03T06:09:18Z

Hi I have some collaborators interested in using this as the first PR to veRL. It should be ready in 2 days

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any support for peft methods on PPO? #159

Any support for peft methods on PPO? #159

HaochenZhao commented Jan 29, 2025

Jiayi-Pan commented Jan 30, 2025

vermouth1992 commented Jan 30, 2025

PeterSH6 commented Jan 30, 2025

corbt commented Jan 31, 2025

vermouth1992 commented Feb 3, 2025

vermouth1992 commented Feb 3, 2025 •

edited

Loading

Jiayi-Pan commented Feb 3, 2025

Any support for peft methods on PPO? #159

Any support for peft methods on PPO? #159

Comments

HaochenZhao commented Jan 29, 2025

Jiayi-Pan commented Jan 30, 2025

vermouth1992 commented Jan 30, 2025

PeterSH6 commented Jan 30, 2025

corbt commented Jan 31, 2025

vermouth1992 commented Feb 3, 2025

vermouth1992 commented Feb 3, 2025 • edited Loading

Jiayi-Pan commented Feb 3, 2025

vermouth1992 commented Feb 3, 2025 •

edited

Loading