Which SFT setup is recommended now? #14

tyleryzhu · 2024-09-12T22:23:39Z

It seems like there's three different SFT setups recommended between the code and the paper.

Paper:

Stage 2: 600k image instructions from ALLaVA, 240k video instructions

Code (your ckpt):

Stage 2.1: 600k images, 300k video captions
Stage 2.2: 100k images, 200k video QA

Code (new recipe I assume?):

Stage 2: 600k images, 240k video instruction/QA (?), 15k video captions.

I assume the new recipe is one you tested and gets the same/better numbers than those in the paper? If you could clarify the different settings that would be much appreciated. Thank you!

RifleZhang · 2024-10-31T03:38:35Z

Hello,
from code https://github.com/RifleZhang/LLaVA-Hound-DPO/blob/main/llava_hound_dpo/sft_scripts/video_sft_qa_240k.sh#L19
for SFT stage, it is 100k image + 240k video QA. A small set of 15k caption is mixed, which inspired from ShareGPT4V training, but we didn't tested if that data is removed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which SFT setup is recommended now? #14

Which SFT setup is recommended now? #14

tyleryzhu commented Sep 12, 2024

RifleZhang commented Oct 31, 2024

Which SFT setup is recommended now? #14

Which SFT setup is recommended now? #14

Comments

tyleryzhu commented Sep 12, 2024

RifleZhang commented Oct 31, 2024