SmolVLM-VSR

Finetuning SmolVLM for visual reasoning tasks

Setup

git clone https://github.com/bharathsivaram10/SmolVLM-VSR.git

This assumes you have an environment setup with common DL libraries such as pytorch, transformers, etc. I ran this using an A100 on https://lambdalabs.com/. I don't recommend anything lower since the flash attention is optimized for Ampere+ GPUs (as far as I know)

Run the bash file to update some packages and also download the dataset repo + images

cd SmolVLM-VSR
bash setup.sh

Sometimes, the Dropbox link used in the setup.sh file can fail due to too many downloads in that day. In that case go into the visual-spatial-reasoning/data directory and follow the instructions to download COCO data and run the select_only_revlevant_images.py script. I suggest saving the images locally so you can skip the image download step in future.

If you have already downloaded the images and just want to get dependencies, run with the skip images option:

bash setup.sh --skip-images

You can now follow the along in the notebook!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
SmolVLM-Instruct-VSR-Saved.zip		SmolVLM-Instruct-VSR-Saved.zip
VSRBook.ipynb		VSRBook.ipynb
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmolVLM-VSR

Setup

About

Releases

Packages

Languages

bharathsivaram10/SmolVLM-VSR

Folders and files

Latest commit

History

Repository files navigation

SmolVLM-VSR

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages