Skip to content

Latest commit

 

History

History
52 lines (33 loc) · 1.99 KB

README.md

File metadata and controls

52 lines (33 loc) · 1.99 KB

DiT with MoE \ Mamba Block

This repository is a modification of fast DiT. It provides two different blocks.
The first block is based on the MoE (Mixture of Experts) module. The second block is a re-implementation of the bi-mamba block, inspired by DIFFUSSM and Vim

The version 2 and 3 of the bi-mamba block is as follows:

bi-mamba v2

see models.py for details.

There is no completed training version yet.

There are some experimental results:

DiT Model Train Steps FID-50K
(PyTorch Training)
PyTorch Global Training Seed
XL/2 400K 18.1 42
B/4 400K 68.9 42
B/4 400K 68.3 100
------------ ------------- -------------------------------- ------------------------------
DiM-B/4 400K 58.6 0
DiM-L/2 400K 19.8 0

Sampling of DiM-L/2(400K Step)

sample

Preparation Before Training

To extract ImageNet features with 1 GPUs on one node:

bash extract_feature.sh

Training

To launch DiT-XL/2 (256x256) training with N GPUs on one node:

bash train.sh

Evaluation (FID, Inception Score, etc.)

bash sample_ddp.sh

generates a folder of samples as well as a .npz file which can be directly used with ADM's TensorFlow evaluation suite to compute FID, Inception Score and other metrics.