Replies: 1 comment
-
Hi, thanks for your interest! The implementation of PI-resize during training is here: https://github.com/google-research/big_vision/blob/main/big_vision/models/proj/flexi/vit.py#L30-L75 In words: PI-resize does not introduce any new trainable parameters. You define some learnable parameter for the patch-embedding just like in regular ViT: pick any patch-size, doesn't really matter what, we use 32x32, so allocate a 32x32x3x[model-dim] buffer. Then, before passing that to the conv operation for patch-embedding, multiply it with the PI-resize matrix. That matrix can be computed analytically once at the start and is not trained, see code pointer above. I'm not sure what loss you mean - there is no need to change whatever loss you are using when "flexifying" your training loop. |
Beta Was this translation helpful? Give feedback.
-
FlexiViT is a very imaginative work.
I was also bothered by the flexible patch size.
I want to know how to implement PI-resize in Section 3.4 in the code.
And how to optimize the PI-resize in training.
Beta Was this translation helpful? Give feedback.
All reactions