Feature/add remax support #234

liziniu · 2025-02-09T12:56:49Z

Description

Added ReMax support to verl. ReMax is a simple, efficient, and stable RL algorithm customized for LLM training, with theoretical guarantees for variance reduction.

The HybridFlow paper experimented with ReMax, but verl did not provide an implementation. Therefore, ReMax has been added.

Changes

Added RayReMaxTrainer implementation
Added example scripts for ReMax training
Added documentation for ReMax algorithm

Testing

Tested ReMax example scripts with Qwen models

validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K dataset:

The curve demonstrates the effectiveness of ReMax, though its performance can be further enhanced through hyperparameter fine-tuning.

Documentation

Added ReMax documentation
Updated example configurations

Checklist

Code follows project's style guidelines (yapf formatted)
Tests added/updated and passing
Documentation updated
Example scripts added

vermouth1992 · 2025-02-09T13:29:58Z

Hi @liziniu Thank you for your contribution! According to our implementation, I guess ReMax can be implemented by adding a few lines to the original PPO/GRPO/Reinforce implementation instead of writing a new trainer to make maintenance easier. Correct me if this is invalid

PeterSH6 · 2025-02-09T13:55:45Z

+1. From my understanding, Remax can be implemented similarly to reinforce++ with a different adv estimator. See reinforce++ implementation: #228

eric-haibin-lin · 2025-02-10T00:11:48Z

README.md

@@ -41,7 +41,7 @@ verl is fast with:
 - **vLLM** and **TGI** for rollout generation, **SGLang** support coming soon.
 - huggingface models support
 - Supervised fine-tuning
- Reinforcement learning from human feedback with [PPO](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer) and [GRPO](https://github.com/volcengine/verl/tree/main/examples/grpo_trainer)
+- Reinforcement learning from human feedback with [PPO](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer), [GRPO](https://github.com/volcengine/verl/tree/main/examples/grpo_trainer), and [ReMax](https://github.com/volcengine/verl/tree/main/examples/remax_trainer)


if you have the training log and wandb already, would you mind adding one more record to docs/experiment/ppo.rst to include remax? it would help the community to track if experiment can be reproduced in future version.
We can do that in the next PR

Yes. A preliminary result on Qwen2.5-3B is added and more results will come later.

liziniu · 2025-02-10T00:55:53Z

@vermouth1992 @PeterSH6 I see. Let me reformat the code with minimal changes of the PPO's trainer.

liziniu · 2025-02-10T04:27:12Z

Hi, @vermouth1992 @PeterSH6

I have completed the implementation of ReMaX support. The changes include:

Remove the new trainer for ReMax
Implemented Remax based on PPO's trainer
Updated the preliminary result in docs/experiment/ppo.rst

The code follows the project's style guidelines.

Please review when you have a chance. Let me know if any changes or clarifications are needed.

Thank you for your time!

verl/trainer/ppo/core_algos.py

vermouth1992 · 2025-02-10T09:16:14Z

Could you add a CI to run remax with Qwen 0.5b to protect this functionality? You can follow the example here: https://github.com/volcengine/verl/blob/main/.github/workflows/e2e_gsm8k.yml#L69

liziniu added 2 commits February 9, 2025 12:38

feat: add ReMax support

57e3814

update README for remax

cf6402f

eric-haibin-lin reviewed Feb 10, 2025

View reviewed changes

liziniu added 2 commits February 10, 2025 01:58

reformat remax's code

81b22d7

remove unused file

3b5586c

PeterSH6 requested changes Feb 10, 2025

View reviewed changes

verl/trainer/ppo/core_algos.py Outdated Show resolved Hide resolved

liziniu added 2 commits February 10, 2025 04:59

fix: restore accidentally removed reinforce++ implementation

560e840

fix: update the info for 'compute_remax_outcome_advantage'

123da4a

PeterSH6 approved these changes Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/add remax support #234

Feature/add remax support #234

liziniu commented Feb 9, 2025

vermouth1992 commented Feb 9, 2025 •

edited

Loading

PeterSH6 commented Feb 9, 2025

eric-haibin-lin Feb 10, 2025

liziniu Feb 10, 2025

liziniu commented Feb 10, 2025

liziniu commented Feb 10, 2025

vermouth1992 commented Feb 10, 2025 •

edited

Loading

Feature/add remax support #234

Are you sure you want to change the base?

Feature/add remax support #234

Conversation

liziniu commented Feb 9, 2025

Description

Changes

Testing

Documentation

Checklist

vermouth1992 commented Feb 9, 2025 • edited Loading

PeterSH6 commented Feb 9, 2025

eric-haibin-lin Feb 10, 2025

Choose a reason for hiding this comment

liziniu Feb 10, 2025

Choose a reason for hiding this comment

liziniu commented Feb 10, 2025

liziniu commented Feb 10, 2025

vermouth1992 commented Feb 10, 2025 • edited Loading

vermouth1992 commented Feb 9, 2025 •

edited

Loading

vermouth1992 commented Feb 10, 2025 •

edited

Loading