Release v0.15.0 · huggingface/trl

Major and breaking changes

Coming soon

What's Changed

⬆️ Bump dev version by @qgallouedec in #2689
📦 trl.templates in excluded packages by @qgallouedec in #2690
📖 Docs fix spelling issues by @nnsW3 in #2682
📄 Add GRPO batch size note in docs by @sdpkjc in #2672
🙈 Fixed typo in the GRPO documentation by @famouswizard in #2691
docs: Fix broken "Good First Issue" link in CONTRIBUTING.md by @famouswizard in #2693
🧠 Fix typo in "understand" in ppo_trainer.md by @famouswizard in #2695
☠️ Remove deprecated by @qgallouedec in #2692
💡 Add "Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial" by @qgallouedec in #2697
📋 Add eval loss logging during prediction in GRPO by @kashif in #2694
fix: Fix typo in filename Update ultrafeedback.py by @brawncode in #2699
📖 Add GRPOTrainer to README.md by @burtenshaw in #2713
Improve GRPO example by @lewtun in #2717
📖 Nit Fix in Documentation by @ParagEkbote in #2722
🏰 num_logits_to_keep to logits_to_keep by @qgallouedec in #2721
💰 Fix incorrect calculation in Olivia's baguette spending logic by @defiberrys in #2727
fix: Fix typo in filename in ultrafeedback-prompt.py by @brawncode in #2716
docs: Fix typos in alias descriptions by @defiberrys in #2729
⚠️ Fix Attention Masking in GRPO by @andyl98 in #2708
🔂 Use vLLM prefix caching for speedup by @winglian in #2757
💔 Decouple loss computing and generation in GRPO by @qgallouedec in #2762
📌 vLLM >= 0.7.1 for device fix by @ctjlewis in #2766
📐 Add vLLM dtype configuration for GRPO trainer by @joey00072 in #2738
📖 Clarification max len in Reward documentation by @ParagEkbote in #2740
🔎 Add missing script argument in PPO documentation by @JohnConnor123 in #2720
🤖 Properly unwrap torch.compile-ed models in GRPO by @winglian in #2750
🔁 🦈 Support iterative GRPO by @shirinyamani in #2700
🚧 Add Optional ZeRO-3 Weight Gathering for GRPO in Sequence Generation by @SeungyounShin in #2667
↔️ GRPO: Set max_model_len when initializing vLLM instance by @mirceapricop in #2728
💡 GRPO vram-efficiency improvement; only compute relevant logprobs by @tyler-romero in #2773
🙃 Fix reward function in GRPO example by @junuMoon in #2777
💡 Add 'Post training an LLM for reasoning with GRPO in TRL' tutorial by @sergiopaniego in #2785
📉 Optimize GRPO memory usage by redefining per_device_batch_size as generations per device by @qgallouedec in #2776
🆚 Distinguish padding and eos when they differ by @binary-husky in #2793
🎯 [SFT] add token accuracy metric by @kashif in #2597
📠 Log completions for GRPO by @qgallouedec in #2772
🔬 SFT simplification by @qgallouedec in #2405
➖ Fix GRPO example in README by @qgallouedec in #2800
⛰️ Reduce peak vram consumption with efficient selective log_softmax by @tyler-romero in #2799
fix: typos in documentation files by @maximevtush in #2804
📤 GRPO refactor loading the model weights to vllm by @winglian in #2817
🫘 Add set_seed() call in GRPO to ensure unique seed for each process by @qgallouedec in #2824
⚖️ Add reward weight in multi-reward settings for GRPO by @hesamsheikh in #2676
🙌 Share vLLM device with training when only 1 available by @qgallouedec in #2827
👴 Update tokenizer parameter to processing_class in tests by @qgallouedec in #2828
🥾 Allow bootstrap GRPO by @qgallouedec in #2829
⚡ Fix GRPO PEFT by @qgallouedec in #2725
Fix PeftModel check when moving weights to vlllm by @edbeeching in #2850
🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py by @loveychen in #2843
👨‍👩‍👧 GRPO + PEFT + vLLM by @winglian in #2818

New Contributors

@nnsW3 made their first contribution in #2682
@sdpkjc made their first contribution in #2672
@famouswizard made their first contribution in #2691
@brawncode made their first contribution in #2699
@ParagEkbote made their first contribution in #2722
@defiberrys made their first contribution in #2727
@ctjlewis made their first contribution in #2766
@joey00072 made their first contribution in #2738
@JohnConnor123 made their first contribution in #2720
@shirinyamani made their first contribution in #2700
@mirceapricop made their first contribution in #2728
@tyler-romero made their first contribution in #2773
@junuMoon made their first contribution in #2777
@binary-husky made their first contribution in #2793
@maximevtush made their first contribution in #2804
@hesamsheikh made their first contribution in #2676
@loveychen made their first contribution in #2843

Full Changelog: v0.9.6...v0.15.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.15.0

Major and breaking changes

What's Changed

New Contributors

Contributors