You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to fine-tune the segmentation model using the pretrained Vim-T, but encountered the following issue while executing bash scripts/ft_vim_tiny_upernet.sh:
Position interpolate from 14x14 to 32x32
Traceback (most recent call last):
File "/home/vic1113/miniconda3/envs/vim_seg/lib/python3.9/site-packages/mmcv/utils/registry.py", line 69,
in build_from_cfg return obj_cls(**args)
File "/home/vic1113/PrMamba/seg/backbone/vim.py", line 89, in __init__
self.init_weights(pretrained)
File "/home/vic1113/PrMamba/seg/backbone/vim.py", line 143, in init_weights
interpolate_pos_embed(self, state_dict_model)
File "/home/vic1113/PrMamba/vim/utils.py", line 258, in interpolate_pos_embed
pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
RuntimeError: shape '[-1, 14, 14, 192]' is invalid for input of size 37824
This error is propagated through multiple functions, resulting in the final error: RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824.
The pretrained weight I used was vim_t_midclstok_76p1acc.pth, which seems to be the correct one. If not, there should be an error while loading, such as size mismatch for norm_f.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([384]), but I didn't get this error.
So, I guess there might be an issue with the model settings, but I’m not sure. 37824 = (14*14 + 1) * 192, and the "+1" is the part that leads to the error. If the "+1" part is for mid cls token, should I just drop it for the segmentation model?
Have anyone ever encountered this problem, or successfully finetuned a segmentation model?
Thank you very much!
The text was updated successfully, but these errors were encountered:
No, I can't apply the pretrained weights to the segmentation model.
It seems the shapes of the backbones are different, and we might need to retrain it.
I tried to fine-tune the segmentation model using the pretrained Vim-T, but encountered the following issue while executing
bash scripts/ft_vim_tiny_upernet.sh
:This error is propagated through multiple functions, resulting in the final error:
RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824
.The pretrained weight I used was
vim_t_midclstok_76p1acc.pth
, which seems to be the correct one. If not, there should be an error while loading, such assize mismatch for norm_f.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([384])
, but I didn't get this error.So, I guess there might be an issue with the model settings, but I’m not sure. 37824 = (14*14 + 1) * 192, and the "+1" is the part that leads to the error. If the "+1" part is for mid cls token, should I just drop it for the segmentation model?
Have anyone ever encountered this problem, or successfully finetuned a segmentation model?
Thank you very much!
The text was updated successfully, but these errors were encountered: