-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any support for peft methods on PPO? #159
Comments
Hi we are working on it, we'll try to send a PR in 2 weeks |
Shall we have a discussion on how this should be implemented? |
I think we can organize a quick Zoom meeting to discuss this. cc: @Jiayi-Pan |
I'd also be interested in joining the meeting if it will happen in English. 🙂 |
Hi @corbt, we haven't started yet. But here is my draft. The implementation is pretty straight-forward
|
Another optimization that we can do is to use the base model for reference policy. This can also be done easily by using disable_adaptors API. https://huggingface.co/docs/transformers/main/en/peft In this way, we only keep one model for actor/rollout/reference policy. The memory consumption of GRPO is identical to DPO |
Hi I have some collaborators interested in using this as the first PR to veRL. It should be ready in 2 days |
I am working on a project about applying RL to LLM but only have very limited resource. Hope verl team can support peft methods like lora on your ppo trainer
The text was updated successfully, but these errors were encountered: