Add Remote Reward Server Feature #419

YuchenFan48 · 2025-02-28T07:23:23Z

Description

This PR introduces support for remote generative models to enhance the verification and reward assignment process in VERL. Previously, VERL only supported rule-based rewards for verification. With this update, we enable more flexible and dynamic reward mechanisms using generative models.

Key Changes

Bug Fix:

Resolved an issue in main_ppo.py where the val_dataloader was incorrectly processed in batches. This ensures proper handling of validation data during training.

New Features:

Added a Generative Reward Manager in workers/reward_manager to handle reward generation using remote generative models.
Implemented a corresponding compute_score function in utils/reward_score to calculate scores based on generative model outputs.

Issue Resolution:

Closes #269 and #229: Adds support for remote generative reward mechanisms.

YuchenFan48 and others added 11 commits February 20, 2025 18:35

Remote Gen Reward

a46876a

Remote Gen Reward

c351370

Remote Gen Reward

8cbe8a2

Remote Gen Reward

9c2e861

Update

ebbfe37

Update

44a6cec

Update

148a38a

Update run_ppo.sh

56a13a8

Update

41ba36b

Update

a1c9b55

Add Licences

06acafd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Remote Reward Server Feature #419

Add Remote Reward Server Feature #419

YuchenFan48 commented Feb 28, 2025

Add Remote Reward Server Feature #419

Are you sure you want to change the base?

Add Remote Reward Server Feature #419

Conversation

YuchenFan48 commented Feb 28, 2025

Description

Key Changes

Bug Fix:

New Features:

Issue Resolution: