Question on fine-tuning Stable Diffusion #5

TomLucidor · 2024-09-09T06:30:48Z

Would Diffusion-RPO be better for Stable Diffusion than DPO? ORPO can be used to replace SFT/RLHF/DPO in LLMs, since even "bad examples" (artifacts and misalignment) gets baked into the fine-tuning. https://arxiv.org/abs/2406.06382 https://arxiv.org/abs/2403.07691
Can RPO and DPO apply to LoRA refinement, rather than just Checkpoint models? That way, LoRAs made from low-resource topics can be further adjusted after initial training is done (not even gonna start with POA and other down-sizing methods) https://www.arxiv.org/abs/2408.01031
What other use cases are there for DPO? For example, there is this repo about it being used alongside "negative prompts" https://arxiv.org/abs/2407.01606v1

Provide feedback