结合 GRPO 支持 DeepSeek-R1 等推理模型的复现，达到 huggingface open-r1 的类似效果 #6792

submartingales · 2025-02-02T07:36:17Z

支持在已有模型的基础上复现出 DeepSeek-R1 的效果，主要需要整合 GRPO 算法，GRPO 目前已在 git+https://github.com/huggingface/trl.git 中得到实现

No response

The text was updated successfully, but these errors were encountered:

Syazvinski · 2025-02-02T19:29:20Z

+1

submartingales added enhancement New feature or request pending This problem is yet to be addressed labels Feb 2, 2025

Provide feedback