UNINEXT achieves superior performance on 20 benchmarks, using the same model with the same model parameters. UNINEXT has 3 training stages, pretraining, image-level joint training, and video-level joint training. We provide all the checkpoints of all stages for models with different backbones.
Backbone | YAML | Model |
---|---|---|
ResNet-50 | obj365v2_32g_r50 | model |
ConvNeXt-Large | obj365v2_32g_convnext_large | model |
ViT-Huge | obj365v2_32g_vit_huge | model |
Backbone | YAML | Model |
---|---|---|
ResNet-50 | image_joint_r50 | model |
ConvNeXt-Large | image_joint_convnext_large | model |
ViT-Huge | image_joint_vit_huge_32g | model |
All numbers reported in the paper (Table 1 to Table 10) uses the following models.
Backbone | YAML | Model |
---|---|---|
ResNet-50 | video_joint_r50 | model |
ConvNeXt-Large | video_joint_convnext_large | model |
ViT-Huge | video_joint_vit_huge | model |
Please note that the pretrained weights used in this stage ends with model_final_4c.pth
. To obtain these weights, please run the following commands
python3 conversion/convert_3c_to_4c_pth.py # ResNet backbone
python3 conversion/convert_3c_to_4c_pth_convnext.py # ConvNeXt backbone
python3 conversion/convert_3c_to_4c_pth_vit.py # ViT backbone
We also provide models trained on a single task with ResNet-50 backbone (Table 11 in the paper).
Task | YAML | Model |
---|---|---|
OD&IS | single_task_det | model |
REC&RES | single_task_rec | model |
VIS | single_task_vis | model |
RVOS | single_task_rvos | model |
SOT&VOS | single_task_sot | model |