DiT with MoE \ Mamba Block

This repository is a modification of fast DiT. It provides two different blocks.
The first block is based on the MoE (Mixture of Experts) module. The second block is a re-implementation of the bi-mamba block, inspired by DIFFUSSM and Vim

The version 2 and 3 of the bi-mamba block is as follows:

see models.py for details.

There is no completed training version yet.

There are some experimental results:

DiT Model	Train Steps	FID-50K (PyTorch Training)	PyTorch Global Training Seed
XL/2	400K	18.1	42
B/4	400K	68.9	42
B/4	400K	68.3	100
------------	-------------	--------------------------------	------------------------------
DiM-B/4	400K	58.6	0
DiM-L/2	400K	19.8	0

Sampling of DiM-L/2（400K Step）

Preparation Before Training

To extract ImageNet features with 1 GPUs on one node:

bash extract_feature.sh

Training

To launch DiT-XL/2 (256x256) training with N GPUs on one node:

bash train.sh

Evaluation (FID, Inception Score, etc.)

bash sample_ddp.sh

generates a folder of samples as well as a .npz file which can be directly used with ADM's TensorFlow evaluation suite to compute FID, Inception Score and other metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DiT with MoE \ Mamba Block

Sampling of DiM-L/2（400K Step）

Preparation Before Training

Training

Evaluation (FID, Inception Score, etc.)

Files

README.md

Latest commit

History

README.md

File metadata and controls

DiT with MoE \ Mamba Block

Sampling of DiM-L/2（400K Step）

Preparation Before Training

Training

Evaluation (FID, Inception Score, etc.)