Code for the paper "All in an Aggregated Image for In-Image Learning".
pip install -r requirements.txt
The processed dataset and demonstration examples are available from this link.
Unzip the file after downloading and keep the dataset
directory in the root directory of the project.
----IIL
|----dataset
|----src
...
python run_iil.py --exp_name exp_on_mv --dataset mathvista --lt few_shot
python run_vticl.py --exp_name exp_on_mv --dataset mathvista --lt few_shot
python run_ticl.py --exp_name exp_on_mv --dataset mathvista --lt few_shot
If you find In-Image Learning useful for your research and applications, please kindly cite using this BibTeX:
@misc{wang2024single,
title={All in a Single Image: Large Multimodal Models are In-Image Learners},
author={Lei Wang and Wanyu Xu and Zhiqiang Hu and Yihuai Lan and Shan Dong and Hao Wang and Roy Ka-Wei Lee and Ee-Peng Lim},
year={2024},
eprint={2402.17971},
archivePrefix={arXiv},
primaryClass={cs.CV}
}