You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would Diffusion-RPO be better for Stable Diffusion than DPO? ORPO can be used to replace SFT/RLHF/DPO in LLMs, since even "bad examples" (artifacts and misalignment) gets baked into the fine-tuning. https://arxiv.org/abs/2406.06382https://arxiv.org/abs/2403.07691
Can RPO and DPO apply to LoRA refinement, rather than just Checkpoint models? That way, LoRAs made from low-resource topics can be further adjusted after initial training is done (not even gonna start with POA and other down-sizing methods) https://www.arxiv.org/abs/2408.01031
What other use cases are there for DPO? For example, there is this repo about it being used alongside "negative prompts" https://arxiv.org/abs/2407.01606v1
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: