implement REINFORCE++ algorithm #228

4332001876 · 2025-02-08T14:40:15Z

We have implemented the REINFORCE++ algorithm.

To use it, specify the parameter algorithm.adv_estimator=reinforce_plus_plus.

Preliminary performance evaluations were conducted within the Unakar/Logic-RL project, a reproduction of DeepSeek R1 Zero on the 2K Tiny Logic Puzzle Dataset. Results indicate that our REINFORCE++ implementation exhibits performance and training stability comparable to, or potentially exceeding, that of PPO and GRPO.

Related issue: #68

verl/trainer/ppo/core_algos.py

vermouth1992 · 2025-02-09T02:35:01Z

Could you perform formatting according to the readme?

4332001876 · 2025-02-09T07:49:38Z

Modifications made according to your feedback.

vermouth1992 · 2025-02-09T08:02:21Z

The formatting CI still fails.

4332001876 · 2025-02-09T08:37:44Z

Resolved formatting issues by running the script; sorry for the initial mistake.

verl/trainer/ppo/ray_trainer.py

implement REINFORCE++ algorithm

62ca7b8

vermouth1992 reviewed Feb 8, 2025

View reviewed changes

verl/trainer/ppo/core_algos.py Outdated Show resolved Hide resolved

add citation for R++; formatting

20fbf5c

formatting

83b0cd4

eric-haibin-lin reviewed Feb 9, 2025

View reviewed changes

verl/trainer/ppo/ray_trainer.py Show resolved Hide resolved

update related document

099aee0

vermouth1992 approved these changes Feb 9, 2025

View reviewed changes

vermouth1992 merged commit bdb50ac into volcengine:main Feb 9, 2025
11 checks passed

PeterSH6 mentioned this pull request Feb 9, 2025

Feature/add remax support #234

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement REINFORCE++ algorithm #228

implement REINFORCE++ algorithm #228

4332001876 commented Feb 8, 2025

vermouth1992 commented Feb 9, 2025

4332001876 commented Feb 9, 2025

vermouth1992 commented Feb 9, 2025

4332001876 commented Feb 9, 2025

implement REINFORCE++ algorithm #228

implement REINFORCE++ algorithm #228

Conversation

4332001876 commented Feb 8, 2025

vermouth1992 commented Feb 9, 2025

4332001876 commented Feb 9, 2025

vermouth1992 commented Feb 9, 2025

4332001876 commented Feb 9, 2025