v0.15.0
Major and breaking changes
Coming soon
What's Changed
- ⬆️ Bump dev version by @qgallouedec in #2689
- 📦
trl.templates
in excluded packages by @qgallouedec in #2690 - 📖 Docs fix spelling issues by @nnsW3 in #2682
- 📄 Add GRPO batch size note in docs by @sdpkjc in #2672
- 🙈 Fixed typo in the GRPO documentation by @famouswizard in #2691
- docs: Fix broken "Good First Issue" link in CONTRIBUTING.md by @famouswizard in #2693
- 🧠 Fix typo in "understand" in ppo_trainer.md by @famouswizard in #2695
- ☠️ Remove deprecated by @qgallouedec in #2692
- 💡 Add "Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial" by @qgallouedec in #2697
- 📋 Add eval loss logging during prediction in GRPO by @kashif in #2694
- fix: Fix typo in filename Update ultrafeedback.py by @brawncode in #2699
- 📖 Add GRPOTrainer to README.md by @burtenshaw in #2713
- Improve GRPO example by @lewtun in #2717
- 📖 Nit Fix in Documentation by @ParagEkbote in #2722
- 🏰
num_logits_to_keep
tologits_to_keep
by @qgallouedec in #2721 - 💰 Fix incorrect calculation in Olivia's baguette spending logic by @defiberrys in #2727
- fix: Fix typo in filename in ultrafeedback-prompt.py by @brawncode in #2716
- docs: Fix typos in alias descriptions by @defiberrys in #2729
⚠️ Fix Attention Masking in GRPO by @andyl98 in #2708- 🔂 Use vLLM prefix caching for speedup by @winglian in #2757
- 💔 Decouple loss computing and generation in GRPO by @qgallouedec in #2762
- 📌 vLLM >= 0.7.1 for device fix by @ctjlewis in #2766
- 📐 Add vLLM dtype configuration for GRPO trainer by @joey00072 in #2738
- 📖 Clarification max len in Reward documentation by @ParagEkbote in #2740
- 🔎 Add missing script argument in PPO documentation by @JohnConnor123 in #2720
- 🤖 Properly unwrap torch.compile-ed models in GRPO by @winglian in #2750
- 🔁 🦈 Support iterative GRPO by @shirinyamani in #2700
- 🚧 Add Optional ZeRO-3 Weight Gathering for GRPO in Sequence Generation by @SeungyounShin in #2667
↔️ GRPO: Set max_model_len when initializing vLLM instance by @mirceapricop in #2728- 💡 GRPO vram-efficiency improvement; only compute relevant logprobs by @tyler-romero in #2773
- 🙃 Fix reward function in GRPO example by @junuMoon in #2777
- 💡 Add 'Post training an LLM for reasoning with GRPO in TRL' tutorial by @sergiopaniego in #2785
- 📉 Optimize GRPO memory usage by redefining
per_device_batch_size
as generations per device by @qgallouedec in #2776 - 🆚 Distinguish padding and eos when they differ by @binary-husky in #2793
- 🎯 [SFT] add token accuracy metric by @kashif in #2597
- 📠 Log completions for GRPO by @qgallouedec in #2772
- 🔬 SFT simplification by @qgallouedec in #2405
- ➖ Fix GRPO example in README by @qgallouedec in #2800
- ⛰️ Reduce peak vram consumption with efficient selective log_softmax by @tyler-romero in #2799
- fix: typos in documentation files by @maximevtush in #2804
- 📤 GRPO refactor loading the model weights to vllm by @winglian in #2817
- 🫘 Add
set_seed()
call in GRPO to ensure unique seed for each process by @qgallouedec in #2824 - ⚖️ Add reward weight in multi-reward settings for GRPO by @hesamsheikh in #2676
- 🙌 Share vLLM device with training when only 1 available by @qgallouedec in #2827
- 👴 Update
tokenizer
parameter toprocessing_class
in tests by @qgallouedec in #2828 - 🥾 Allow bootstrap GRPO by @qgallouedec in #2829
- ⚡ Fix GRPO PEFT by @qgallouedec in #2725
- Fix PeftModel check when moving weights to vlllm by @edbeeching in #2850
- 🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py by @loveychen in #2843
- 👨👩👧 GRPO + PEFT + vLLM by @winglian in #2818
New Contributors
- @nnsW3 made their first contribution in #2682
- @sdpkjc made their first contribution in #2672
- @famouswizard made their first contribution in #2691
- @brawncode made their first contribution in #2699
- @ParagEkbote made their first contribution in #2722
- @defiberrys made their first contribution in #2727
- @ctjlewis made their first contribution in #2766
- @joey00072 made their first contribution in #2738
- @JohnConnor123 made their first contribution in #2720
- @shirinyamani made their first contribution in #2700
- @mirceapricop made their first contribution in #2728
- @tyler-romero made their first contribution in #2773
- @junuMoon made their first contribution in #2777
- @binary-husky made their first contribution in #2793
- @maximevtush made their first contribution in #2804
- @hesamsheikh made their first contribution in #2676
- @loveychen made their first contribution in #2843
Full Changelog: v0.9.6...v0.15.0