This repository is a modification of fast DiT. It provides two different blocks.
The first block is based on the MoE (Mixture of Experts) module. The second block is a re-implementation of the bi-mamba block, inspired by DIFFUSSM and Vim
The version 2 and 3 of the bi-mamba block is as follows:
see models.py
for details.
There is no completed training version yet.
There are some experimental results:
DiT Model | Train Steps | FID-50K (PyTorch Training) |
PyTorch Global Training Seed |
---|---|---|---|
XL/2 | 400K | 18.1 | 42 |
B/4 | 400K | 68.9 | 42 |
B/4 | 400K | 68.3 | 100 |
------------ | ------------- | -------------------------------- | ------------------------------ |
DiM-B/4 | 400K | 58.6 | 0 |
DiM-L/2 | 400K | 19.8 | 0 |
To extract ImageNet features with 1
GPUs on one node:
bash extract_feature.sh
To launch DiT-XL/2 (256x256) training with N
GPUs on one node:
bash train.sh
bash sample_ddp.sh
generates a folder of samples as well as a .npz
file which can be directly used with ADM's TensorFlow
evaluation suite to compute FID, Inception Score and other metrics.