Fine-tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia

Mohammad Fahes¹, Tuan-Hung Vu^1,2, Andrei Bursuc^1,2, Patrick Pérez³, Raoul de Charette¹
¹ Inria, Paris, France.

² valeo.ai, Paris, France.

³ Kyutai, Paris, France.

TL; DR: CLIP projects visual embedding to the shared latent space using a linear projection layer. We show that simply fine-tuning this guy (:p) can be a strong alternative to linear probing, prompt tuning and CLIP-adapters, and performs also well on test-time adaptation.

Stay tuned for the code!

Paper: https://arxiv.org/abs/2410.05270

ProLIP

We fine-tune the pretrained linear projection layer of the vision encoder with a regularization loss towards the pre-trained weights.

Citation

@article{fahes2024fine,
  title={Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia},
  author={Fahes, Mohammad and Vu, Tuan-Hung and Bursuc, Andrei and P{\'e}rez, Patrick and de Charette, Raoul},
  journal={arXiv preprint arXiv:2410.05270},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
figures		figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia

ProLIP

Citation

About

Releases

Packages

astra-vision/ProLIP

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia

ProLIP

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages