Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

结合 GRPO 支持 DeepSeek-R1 等推理模型的复现,达到 huggingface open-r1 的类似效果 #6792

Open
1 task done
submartingales opened this issue Feb 2, 2025 · 1 comment
Labels
enhancement New feature or request pending This problem is yet to be addressed

Comments

@submartingales
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

Description

支持在已有模型的基础上复现出 DeepSeek-R1 的效果,主要需要整合 GRPO 算法,GRPO 目前已在 git+https://github.com/huggingface/trl.git 中得到实现

Pull Request

No response

@submartingales submartingales added enhancement New feature or request pending This problem is yet to be addressed labels Feb 2, 2025
@Syazvinski
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants