Skip to content

Releases: MzeroMiko/VMamba

VMamba v0 Segmentation checkpoints

22 Feb 16:51
Choose a tag to compare

Semantic Segmentation on ADE20K

Backbone Input #params FLOPs Segmentor mIoU(SS) mIoU(MS) configs/logs/logs(ms)/ckpts
Vanilla-VMamba-T 512x512 55M 939G 964G UperNet@160k 47.3 48.3 config/log/log(ms)/ckpt
Vanilla-VMamba-S 512x512 76M 1037G 1081G UperNet@160k 49.5 50.5 config/log/log(ms)/ckpt
Vanilla-VMamba-B 512x512 110M 1167G 1226G UperNet@160k 50.0 51.3 config/log/log(ms)/ckpt

VMamba v0 Detection checkpoints

22 Feb 16:09
Choose a tag to compare

Object Detection on COCO

Backbone #params FLOPs Detector bboxAP bboxAP50 bboxAP75 segmAP segmAP50 segmAP75 configs/logs/ckpts
Vanilla-VMamba-T 42M 262G 286G MaskRCNN@1x 46.5 68.5 50.7 42.1 65.5 45.3 config/log/ckpt
Vanilla-VMamba-S 64M 357G 400G MaskRCNN@1x 48.2 69.7 52.5 43.0 66.6 46.4 config/log/ckpt
Vanilla-VMamba-B 96M 482G 540G MaskRCNN@1x 48.6 70.0 53.1 43.3 67.1 46.7 config/log/ckpt
:---: :---: :---: :---: :---: :---: :---: :---: :---: :---: :---:
Vanilla-VMamba-T 42M 262G 286G MaskRCNN@3x 48.5 70.0 52.7 43.2 66.9 46.4 config/log/ckpt
Vanilla-VMamba-S 64M 357G 400G MaskRCNN@3x 49.7 70.4 54.2 44.0 67.6 47.3 config/log/ckpt

VMamba v0 Classification checkpoints

18 Feb 03:28
Choose a tag to compare

Checkpoints for VMamba (alias of vssm version 0)

These checkpoints correspond to the experiments done before date #20240119.

name pretrain resolution acc@1 #params FLOPs best epoch use ema config
VMamba-T ImageNet-1K 224x224 82.2 22M 4.5G 5.6G 292 did'nt add config
VMamba-S ImageNet-1K 224x224 83.5 44M 9.1G 11.2G 238 true config
VMamba-B ImageNet-1K 224x224 83.2 75M 15.2G 18.0G 260 did'nt add config
VMamba-B* ImageNet-1K 224x224 83.7 75M 15.2G 18.0G 241 true config

Most backbone models trained without ema, which do not enhance performance \cite(Swin-Transformer). We use ema because our model is still under development, without hyperparameter tuning.

The checkpoints used in object detection and segmentation is VMamba-B with droppath 0.5 + no ema. VMamba-B* represents for VMamba-B with droppath 0.6 + ema, the performance of which is non-ema: 83.3 in epoch 262; ema: 83.7 in epoch 241

VMamba v2 Segmentation checkpoints

20 Mar 03:13
Choose a tag to compare

Semantic Segmentation on ADE20K

Backbone Input #params FLOPs Segmentor mIoU(SS) mIoU(MS) configs/logs/logs(ms)/ckpts
VMamba-T[s2l5] 512x512 62M 948G UperNet@160k 48.3 48.6 config/log/log(ms)/ckpt
VMamba-S[s2l15] 512x512 82M 1028G UperNet@160k 50.6 51.2 config/log/log(ms)/ckpt
VMamba-B[s2l15] 512x512 122M 1170G UperNet@160k 51.0 51.6 config/log/log(ms)/ckpt
VMamba-T[s1l8] 512x512 62M 949G UperNet@160k 47.9 48.8 config/log/log(ms)/ckpt

VMamba v2 Detection checkpoints

20 Mar 03:06
Choose a tag to compare

Object Detection on COCO

Backbone #params FLOPs Detector bboxAP bboxAP50 bboxAP75 segmAP segmAP50 segmAP75 configs/logs/ckpts
VMamba-T[s2l5] 50M 270G MaskRCNN@1x 47.4 69.5 52.0 42.7 66.3 46.0 config/log/ckpt
VMamba-S[s2l15] 70M 384G MaskRCNN@1x 48.7 70.0 53.4 43.7 67.3 47.0 config/log/ckpt
VMamba-B[s2l15] 108M 485G MaskRCNN@1x 49.2 71.4 54.0 44.1 68.3 47.7 config/log/ckpt
VMamba-B[s2l15] 108M 485G MaskRCNN@1x[bs8] 49.2 70.9 53.9 43.9 67.7 47.6 config/log/ckpt
VMamba-T[s1l8] 50M 271G MaskRCNN@1x 47.3 69.3 52.0 42.7 66.4 45.9 config/log/ckpt
:---: :---: :---: :---: :---: :---: :---: :---: :---: :---: :---:
VMamba-T[s2l5] 50M 270G MaskRCNN@3x 48.9 70.6 53.6 43.7 67.7 46.8 config/log/ckpt
VMamba-S[s2l15] 70M 384G MaskRCNN@3x 49.9 70.9 54.7 44.20 68.2 47.7 config/log/ckpt
VMamba-T[s1l8] 50M 271G MaskRCNN@3x 48.8 70.4 53.50 43.7 67.4 47.0 config/log/ckpt
  • Models in this subsection is initialized from the models trained in classfication.
  • we now calculate FLOPs with the algrithm @ albertgu provides, which will be bigger than previous calculation (which is based on the selective_scan_ref function, and ignores the hardware-aware algrithm).

VMamba v2 Classification checkpoints

16 Mar 08:47
Choose a tag to compare

Classification on ImageNet-1K

name pretrain resolution acc@1 #params FLOPs TP. Train TP. configs/logs/ckpts
VMamba-T[s2l5] ImageNet-1K 224x224 82.5 31M 4.9G 1340 464 config/log/ckpt
VMamba-S[s2l15] ImageNet-1K 224x224 83.6 50M 8.7G 877 314 config/log/ckpt
VMamba-B[s2l15] ImageNet-1K 224x224 83.9 89M 15.4G 646 247 config/log/ckpt
VMamba-T[s1l8] ImageNet-1K 224x224 82.6 30M 4.9G 1686 571 config/log/ckpt
VMamba-S[s1l20] ImageNet-1K 224x224 83.3 49M 8.6G 1106 390 config/log/ckpt
VMamba-B[s1l20] ImageNet-1K 224x224 83.8 87M 15.2G 827 313 config/log/ckpt
  • Models in this subsection is trained from scratch with random or manual initialization. The hyper-parameters are inherited from Swin, except for drop_path_rate and EMA. All models are trained with EMA except for the Vanilla-VMamba-T.
  • TP.(Throughput) and Train TP. (Train Throughput) are assessed on an A100 GPU paired with an AMD EPYC 7542 CPU, with batch size 128. Train TP. is tested with mix-resolution, excluding the time consumption of optimizers.
  • FLOPs and parameters are now gathered with head (In previous versions, without head, so the numbers raise a little bit).
  • we calculate FLOPs with the algorithm @ albertgu provides, which will be bigger than previous calculation (which is based on the selective_scan_ref function, and ignores the hardware-aware algorithm).

Checkpoints for nightly builds!

22 Feb 02:13
Choose a tag to compare
name pretrain resolution acc@1 #params FLOPs best epoch use ema config
VMamba-T ImageNet-1K 224x224 82.5 32M 5G 258 true config

We use ema because our model is still under development, without hyperparameter tuning.

This is a pre-release