Releases: MzeroMiko/VMamba
Releases · MzeroMiko/VMamba
VMamba v0 Segmentation checkpoints
Semantic Segmentation on ADE20K
Backbone | Input | #params | FLOPs | Segmentor | mIoU(SS) | mIoU(MS) | configs/logs/logs(ms)/ckpts |
---|---|---|---|---|---|---|---|
Vanilla-VMamba-T | 512x512 | 55M | UperNet@160k | 47.3 | 48.3 | config/log/log(ms)/ckpt | |
Vanilla-VMamba-S | 512x512 | 76M | UperNet@160k | 49.5 | 50.5 | config/log/log(ms)/ckpt | |
Vanilla-VMamba-B | 512x512 | 110M | UperNet@160k | 50.0 | 51.3 | config/log/log(ms)/ckpt |
VMamba v0 Detection checkpoints
Object Detection on COCO
Backbone | #params | FLOPs | Detector | bboxAP | bboxAP50 | bboxAP75 | segmAP | segmAP50 | segmAP75 | configs/logs/ckpts |
---|---|---|---|---|---|---|---|---|---|---|
Vanilla-VMamba-T | 42M | MaskRCNN@1x | 46.5 | 68.5 | 50.7 | 42.1 | 65.5 | 45.3 | config/log/ckpt | |
Vanilla-VMamba-S | 64M | MaskRCNN@1x | 48.2 | 69.7 | 52.5 | 43.0 | 66.6 | 46.4 | config/log/ckpt | |
Vanilla-VMamba-B | 96M | MaskRCNN@1x | 48.6 | 70.0 | 53.1 | 43.3 | 67.1 | 46.7 | config/log/ckpt | |
:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
Vanilla-VMamba-T | 42M | MaskRCNN@3x | 48.5 | 70.0 | 52.7 | 43.2 | 66.9 | 46.4 | config/log/ckpt | |
Vanilla-VMamba-S | 64M | MaskRCNN@3x | 49.7 | 70.4 | 54.2 | 44.0 | 67.6 | 47.3 | config/log/ckpt |
VMamba v0 Classification checkpoints
Checkpoints for VMamba
(alias of vssm version 0
)
These checkpoints correspond to the experiments done before date #20240119.
name | pretrain | resolution | acc@1 | #params | FLOPs | best epoch | use ema | config |
---|---|---|---|---|---|---|---|---|
VMamba-T | ImageNet-1K | 224x224 | 82.2 | 22M | 292 | did'nt add | config | |
VMamba-S | ImageNet-1K | 224x224 | 83.5 | 44M | 238 | true | config | |
VMamba-B | ImageNet-1K | 224x224 | 83.2 | 75M | 260 | did'nt add | config | |
VMamba-B* | ImageNet-1K | 224x224 | 83.7 | 75M | 241 | true | config |
Most backbone models trained without ema, which do not enhance performance \cite(Swin-Transformer)
. We use ema because our model is still under development, without hyperparameter tuning.
The checkpoints used in object detection and segmentation is VMamba-B with droppath 0.5
+ no ema
. VMamba-B*
represents for VMamba-B with droppath 0.6 + ema
, the performance of which is non-ema: 83.3 in epoch 262; ema: 83.7 in epoch 241
VMamba v2 Segmentation checkpoints
Semantic Segmentation on ADE20K
Backbone | Input | #params | FLOPs | Segmentor | mIoU(SS) | mIoU(MS) | configs/logs/logs(ms)/ckpts |
---|---|---|---|---|---|---|---|
VMamba-T[s2l5 ] |
512x512 | 62M | 948G | UperNet@160k | 48.3 | 48.6 | config/log/log(ms)/ckpt |
VMamba-S[s2l15 ] |
512x512 | 82M | 1028G | UperNet@160k | 50.6 | 51.2 | config/log/log(ms)/ckpt |
VMamba-B[s2l15 ] |
512x512 | 122M | 1170G | UperNet@160k | 51.0 | 51.6 | config/log/log(ms)/ckpt |
VMamba-T[s1l8 ] |
512x512 | 62M | 949G | UperNet@160k | 47.9 | 48.8 | config/log/log(ms)/ckpt |
VMamba v2 Detection checkpoints
Object Detection on COCO
Backbone | #params | FLOPs | Detector | bboxAP | bboxAP50 | bboxAP75 | segmAP | segmAP50 | segmAP75 | configs/logs/ckpts |
---|---|---|---|---|---|---|---|---|---|---|
VMamba-T[s2l5 ] |
50M | 270G | MaskRCNN@1x | 47.4 | 69.5 | 52.0 | 42.7 | 66.3 | 46.0 | config/log/ckpt |
VMamba-S[s2l15 ] |
70M | 384G | MaskRCNN@1x | 48.7 | 70.0 | 53.4 | 43.7 | 67.3 | 47.0 | config/log/ckpt |
VMamba-B[s2l15 ] |
108M | 485G | MaskRCNN@1x | 49.2 | 71.4 | 54.0 | 44.1 | 68.3 | 47.7 | config/log/ckpt |
VMamba-B[s2l15 ] |
108M | 485G | MaskRCNN@1x[bs8 ] |
49.2 | 70.9 | 53.9 | 43.9 | 67.7 | 47.6 | config/log/ckpt |
VMamba-T[s1l8 ] |
50M | 271G | MaskRCNN@1x | 47.3 | 69.3 | 52.0 | 42.7 | 66.4 | 45.9 | config/log/ckpt |
:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
VMamba-T[s2l5 ] |
50M | 270G | MaskRCNN@3x | 48.9 | 70.6 | 53.6 | 43.7 | 67.7 | 46.8 | config/log/ckpt |
VMamba-S[s2l15 ] |
70M | 384G | MaskRCNN@3x | 49.9 | 70.9 | 54.7 | 44.20 | 68.2 | 47.7 | config/log/ckpt |
VMamba-T[s1l8 ] |
50M | 271G | MaskRCNN@3x | 48.8 | 70.4 | 53.50 | 43.7 | 67.4 | 47.0 | config/log/ckpt |
- Models in this subsection is initialized from the models trained in
classfication
. - we now calculate FLOPs with the algrithm @ albertgu provides, which will be bigger than previous calculation (which is based on the
selective_scan_ref
function, and ignores the hardware-aware algrithm).
VMamba v2 Classification checkpoints
Classification on ImageNet-1K
name | pretrain | resolution | acc@1 | #params | FLOPs | TP. | Train TP. | configs/logs/ckpts |
---|---|---|---|---|---|---|---|---|
VMamba-T[s2l5 ] |
ImageNet-1K | 224x224 | 82.5 | 31M | 4.9G | 1340 | 464 | config/log/ckpt |
VMamba-S[s2l15 ] |
ImageNet-1K | 224x224 | 83.6 | 50M | 8.7G | 877 | 314 | config/log/ckpt |
VMamba-B[s2l15 ] |
ImageNet-1K | 224x224 | 83.9 | 89M | 15.4G | 646 | 247 | config/log/ckpt |
VMamba-T[s1l8 ] |
ImageNet-1K | 224x224 | 82.6 | 30M | 4.9G | 1686 | 571 | config/log/ckpt |
VMamba-S[s1l20 ] |
ImageNet-1K | 224x224 | 83.3 | 49M | 8.6G | 1106 | 390 | config/log/ckpt |
VMamba-B[s1l20 ] |
ImageNet-1K | 224x224 | 83.8 | 87M | 15.2G | 827 | 313 | config/log/ckpt |
- Models in this subsection is trained from scratch with random or manual initialization. The hyper-parameters are inherited from Swin, except for
drop_path_rate
andEMA
. All models are trained with EMA except for theVanilla-VMamba-T
. TP.(Throughput)
andTrain TP. (Train Throughput)
are assessed on an A100 GPU paired with an AMD EPYC 7542 CPU, with batch size 128.Train TP.
is tested with mix-resolution, excluding the time consumption of optimizers.FLOPs
andparameters
are now gathered withhead
(In previous versions, without head, so the numbers raise a little bit).- we calculate
FLOPs
with the algorithm @ albertgu provides, which will be bigger than previous calculation (which is based on theselective_scan_ref
function, and ignores the hardware-aware algorithm).
Checkpoints for nightly builds!
name | pretrain | resolution | acc@1 | #params | FLOPs | best epoch | use ema | config |
---|---|---|---|---|---|---|---|---|
VMamba-T | ImageNet-1K | 224x224 | 82.5 | 32M | 5G | 258 | true | config |
We use ema because our model is still under development, without hyperparameter tuning.
This is a pre-release