Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any support for peft methods on PPO? #159

Open
HaochenZhao opened this issue Jan 29, 2025 · 7 comments
Open

Any support for peft methods on PPO? #159

HaochenZhao opened this issue Jan 29, 2025 · 7 comments

Comments

@HaochenZhao
Copy link

I am working on a project about applying RL to LLM but only have very limited resource. Hope verl team can support peft methods like lora on your ppo trainer

@Jiayi-Pan
Copy link
Contributor

Hi we are working on it, we'll try to send a PR in 2 weeks

@vermouth1992
Copy link
Collaborator

Hi we are working on it, we'll try to send a PR in 2 weeks

Shall we have a discussion on how this should be implemented?

@PeterSH6
Copy link
Collaborator

I think we can organize a quick Zoom meeting to discuss this. cc: @Jiayi-Pan

@corbt
Copy link
Contributor

corbt commented Jan 31, 2025

I'd also be interested in joining the meeting if it will happen in English. 🙂

@vermouth1992
Copy link
Collaborator

I'd also be interested in joining the meeting if it will happen in English. 🙂

Hi @corbt, we haven't started yet. But here is my draft. The implementation is pretty straight-forward

@vermouth1992
Copy link
Collaborator

vermouth1992 commented Feb 3, 2025

Another optimization that we can do is to use the base model for reference policy. This can also be done easily by using disable_adaptors API. https://huggingface.co/docs/transformers/main/en/peft

In this way, we only keep one model for actor/rollout/reference policy. The memory consumption of GRPO is identical to DPO

@Jiayi-Pan
Copy link
Contributor

Hi I have some collaborators interested in using this as the first PR to veRL. It should be ready in 2 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants