Better version

The version I have implemented does not have strict code specifications and may have issues. For a more professional version, please refer to the official diffusers repository: train_diffusion_dpo_sdxl.py

AI Feedback-Based Self-Training Direct Preference Optimization

Dataset Details

Num examples = 37180
Num Epochs = 3

Compared To Human Feedback Model

Our model tends to perform closer to the SDXL-Base, but with optimized image details. The model provided in the original paper exhibits better color and detail performance, more in line with human preferences. This also reflects a characteristic of using self-training to train the original model: it can optimize according to AI preferences while ensuring the capabilities of the original model. Training based on human preference data will make the output quality closely related to the human preference dataset.

Acknowledgement

This work is based on the Diffusion Model Alignment Using Direct Preference Optimization method.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
scorer		scorer
scripts		scripts
README.md		README.md
build_dataset.py		build_dataset.py
preprocess_dataset.py		preprocess_dataset.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Better version

AI Feedback-Based Self-Training Direct Preference Optimization

Dataset Details

Compared To Human Feedback Model

Acknowledgement

About

Releases

Packages

Languages

qqingzheng/AI-Self-Training-DPO-SDXL

Folders and files

Latest commit

History

Repository files navigation

Better version

AI Feedback-Based Self-Training Direct Preference Optimization

Dataset Details

Compared To Human Feedback Model

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages