Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Support for extra_info in Reward Calculation #266

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

maksimstw
Copy link

Enhancement: Support for extra_info in Reward Calculation

Summary

This update enhances the reward computation process by introducing an additional extra_info parameter. This allows users to pass in more contextual information when calculating rewards, improving flexibility for different datasets.

Changes Made

  • Updated _default_compute_score to accept an extra_info argument:
    def _default_compute_score(data_source, solution_str, ground_truth, extra_info):
  • Modified the reward manager (naive.py) to pass extra_info from data_item.non_tensor_batch to compute_score:
    extra_info = data_item.non_tensor_batch['extra_info']
    score = self.compute_score(
        data_source=data_source,
        solution_str=sequences_str,
        ground_truth=ground_truth,
        extra_info=extra_info,
    )

Why This Change?

  • Some datasets require additional context beyond data_source, solution_str, and ground_truth for accurate reward computation.
  • The new extra_info field allows users to pass custom metadata, ideally in dictionary form, as specified in the official documentation.
  • This change maintains compatibility with existing dataset processing scripts, as they already include the extra_info field.

Impact

  • Improved flexibility: Users can now pass additional contextual information, making reward computation more adaptable to different datasets.
  • Backward compatibility: Since all example datasets already include extra_info, this update should integrate seamlessly.

Let me know if any modifications are needed!

pass in extra info to the reward function.
allowing extra_info to be passed in for more advance compute_score
@vermouth1992
Copy link
Collaborator

Could you perform formatting according to readme?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants