Skip to content

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Notifications You must be signed in to change notification settings

yeungchenwa/FontDiffuser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

7b28ce9 Β· Mar 14, 2024

History

84 Commits
Mar 14, 2024
Dec 10, 2023
Jan 27, 2024
Dec 20, 2023
Mar 14, 2024
Jan 27, 2024
Mar 14, 2024
Mar 14, 2024
Dec 20, 2023
Jan 27, 2024
Dec 18, 2023
Mar 14, 2024
Dec 16, 2023

Repository files navigation

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

FontDiffuser_LOGO

arXiv preprint Gradio demo Homepage Code

πŸ”₯ Model Zoo β€’ πŸ› οΈ Installation β€’ πŸ‹οΈ Training β€’ πŸ“Ί Sampling β€’ πŸ“± Run WebUI

🌟 Highlights

Vis_1 Vis_2

  • We propose FontDiffuser, which can generate unseen characters and styles and can be extended to cross-lingual generation, such as Chinese to Korean.
  • FontDiffuser excels in generating complex characters and handling large style variations. And it achieves state-of-the-art performance.
  • The generated results by FontDiffuser can be perfectly used for InstructPix2Pix for decoration, as shown in thr above figure.
  • We release the πŸ’»Hugging Face Demo online! Welcome to Try it Out!

πŸ“… News

  • 2024.01.27: The training of phase 2 is released.
  • 2023.12.20: Our repository is public! πŸ‘πŸ€—
  • 2023.12.19: πŸ”₯πŸŽ‰ The πŸ’»Hugging Face Demo is public! Welcome to try it out!
  • 2023.12.16: The gradio app demo is released.
  • 2023.12.10: Release source code with phase 1 training and sampling.
  • 2023.12.09: πŸŽ‰πŸŽ‰ Our paper is accepted by AAAI2024.
  • Previously: Our Recommendations-of-Diffusion-for-Text-Image repo is public, which contains a paper collection of recent diffusion models for text-image generation tasks. Welcome to check it out!

πŸ”₯ Model Zoo

Model chekcpoint status
FontDiffuer GoogleDrive / BaiduYun:gexg Released
SCR GoogleDrive / BaiduYun:gexg Released

🚧 TODO List

  • Add phase 1 training and sampling script.
  • Add WebUI demo.
  • Push demo to Hugging Face.
  • Add phase 2 training script and checkpoint.
  • Add the pre-training of SCR module.
  • Combined with InstructPix2Pix.

πŸ› οΈ Installation

Prerequisites (Recommended)

  • Linux
  • Python 3.9
  • Pytorch 1.13.1
  • CUDA 11.7

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/FontDiffuser.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Step 3: Install the required packages.

pip install -r requirements.txt

πŸ‹οΈ Training

Data Construction

The training data files tree should be (The data examples are shown in directory data_examples/train/):

β”œβ”€β”€data_examples
β”‚   └── train
β”‚       β”œβ”€β”€ ContentImage
β”‚       β”‚   β”œβ”€β”€ char0.png
β”‚       β”‚   β”œβ”€β”€ char1.png
β”‚       β”‚   β”œβ”€β”€ char2.png
β”‚       β”‚   └── ...
β”‚       └── TargetImage.png
β”‚           β”œβ”€β”€ style0
β”‚           β”‚     β”œβ”€β”€style0+char0.png
β”‚           β”‚     β”œβ”€β”€style0+char1.png
β”‚           β”‚     └── ...
β”‚           β”œβ”€β”€ style1
β”‚           β”‚     β”œβ”€β”€style1+char0.png
β”‚           β”‚     β”œβ”€β”€style1+char1.png
β”‚           β”‚     └── ...
β”‚           β”œβ”€β”€ style2
β”‚           β”‚     β”œβ”€β”€style2+char0.png
β”‚           β”‚     β”œβ”€β”€style2+char1.png
β”‚           β”‚     └── ...
β”‚           └── ...

Training Configuration

Before running the training script (including the following three modes), you should set the training configuration, such as distributed training, through:

accelerate config

Training - Pretraining of SCR

Coming Soon ...

Training - Phase 1

sh train_phase_1.sh
  • data_root: The data root, as ./data_examples
  • output_dir: The training output logs and checkpoints saving directory.
  • resolution: The resolution of the UNet in our diffusion model.
  • style_image_size: The resolution of the style image, can be different with resolution.
  • content_image_size: The resolution of the content image, should be the same as the resolution.
  • channel_attn: Whether to use the channel attention in the MCA block.
  • train_batch_size: The batch size in the training.
  • max_train_steps: The maximum of the training steps.
  • learning_rate: The learning rate when training.
  • ckpt_interval: The checkpoint saving interval when training.
  • drop_prob: The classifier-free guidance training probability.

Training - Phase 2

After the phase 2 training, you should put the trained checkpoint files (unet.pth, content_encoder.pth, and style_encoder.pth) to the directory phase_1_ckpt. During phase 2, these parameters will be resumed.

sh train_phase_2.sh
  • phase_2: Tag to phase 2 training.
  • phase_1_ckpt_dir: The model checkpoints saving directory after phase 1 training.
  • scr_ckpt_path: The ckpt path of pre-trained SCR module. You can download it from above πŸ”₯Model Zoo.
  • sc_coefficient: The coefficient of style contrastive loss for supervision.
  • num_neg: The number of negative samples, default to be 16.

πŸ“Ί Sampling

Step 1 => Prepare the checkpoint

Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.
Option (2) Put your re-training checkpoint folder ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.

Step 2 => Run the script

(1) Sampling image from content image and reference image.

sh script/sample_content_image.sh
  • ckpt_dir: The model checkpoints saving directory.
  • content_image_path: The content/source image path.
  • style_image_path: The style/reference image path.
  • save_image: set True if saving as images.
  • save_image_dir: The image saving directory, the saving files including an out_single.png and an out_with_cs.png.
  • device: The sampling device, recommended GPU acceleration.
  • guidance_scale: The classifier-free sampling guidance scale.
  • num_inference_steps: The inference step by DPM-Solver++.

(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.

sh script/sample_content_character.sh
  • character_input: If set True, use character string as content/source input.
  • content_character: The content/source content character string.
  • The other parameters are the same as the above option (1).

πŸ“± Run WebUI

(1) Sampling by FontDiffuser

gradio gradio_app.py

Example:

(2) Sampling by FontDiffuser and Rendering by InstructPix2Pix

Coming Soon ...

πŸŒ„ Gallery

Characters of hard level of complexity

vis_hard

Characters of medium level of complexity

vis_medium

Characters of easy level of complexity

vis_easy

Cross-Lingual Generation (Chinese to Korean)

vis_korean

πŸ’™ Acknowledgement

Copyright

Citation

@inproceedings{yang2024fontdiffuser,
  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2024}
}

⭐ Star Rising

Star Rising

About

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published