Question About Listed ViT Models in the configs/proj/flexivit/README.md #69

mhamzaerol · 2023-02-14T07:22:02Z

mhamzaerol
Feb 14, 2023

Hello,

First of all, thank you very much for releasing many helpful materials and code samples of the interesting work FlexiVit.

When I went through the paper, the models referred to as ViT-B-16 and ViT-B-30 seems to be the baseline ViT models trained with the fixed patch sizes (16 and 30 respectively). Moreover, accordingly, their positional embedding sizes should be 15 and 8 if I am not wrong (img_size divided by the patch_size).
However, when I downloaded and loaded the .npz files of these models from the README file, I encountered that the patch size and the positional embedding size were 32 and 7 which matches the setup of the flexiVit-B model mentioned in the paper but not that of the baseline ViT models (given my understanding).

Thus, I was curious whether the links map to the wrong models or I misunderstood the setup mentioned in the paper regarding these models.

Could you please help me with this matter.
Thanks!

Answered by lucasb-eyer

Feb 14, 2023

Hi, thanks for your interest and the question!

You almost got it. For simplicity/uniformity of implementation, we also used the "underlying" patch and posemb sizes of 32 and 7 for the baseline models. Figures 17 (b) and (c) in the appendix show that this change has absolutely no effect on the results even for regular (not flexi) ViT models.

So, for the patch embeddings you can just resize them to 16 and 30 at load-time with PI-resize, and for the position embedding, resize them the usual way at load time, i.e. (bi)linear interpolation, the code does these here: https://github.com/google-research/big_vision/blob/main/big_vision/models/proj/flexi/vit.py#L198-L206

To be clear, I did not go a…

View full answer

lucasb-eyer · 2023-02-14T12:55:58Z

lucasb-eyer
Feb 14, 2023

Hi, thanks for your interest and the question!

You almost got it. For simplicity/uniformity of implementation, we also used the "underlying" patch and posemb sizes of 32 and 7 for the baseline models. Figures 17 (b) and (c) in the appendix show that this change has absolutely no effect on the results even for regular (not flexi) ViT models.

So, for the patch embeddings you can just resize them to 16 and 30 at load-time with PI-resize, and for the position embedding, resize them the usual way at load time, i.e. (bi)linear interpolation, the code does these here: https://github.com/google-research/big_vision/blob/main/big_vision/models/proj/flexi/vit.py#L198-L206

To be clear, I did not go and double-check the checkpoints now (as I think I did check when originally uploading them), so do let me know if they somehow don't work.

0 replies

mhamzaerol · 2023-02-15T05:29:59Z

mhamzaerol
Feb 15, 2023
Author

Hi,

When I carefully checked the relevant parts of the paper and the code portion you provided, now all makes sense to me.
Thank you very much for the response.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About Listed ViT Models in the configs/proj/flexivit/README.md #69

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Question About Listed ViT Models in the configs/proj/flexivit/README.md #69

mhamzaerol Feb 14, 2023

Replies: 2 comments

lucasb-eyer Feb 14, 2023

mhamzaerol Feb 15, 2023 Author

mhamzaerol
Feb 14, 2023

lucasb-eyer
Feb 14, 2023

mhamzaerol
Feb 15, 2023
Author