Skip to content

Latest commit

 

History

History
30 lines (20 loc) · 1.15 KB

README.md

File metadata and controls

30 lines (20 loc) · 1.15 KB

VSR_LRS3

Performance and checkpoints

We only train the linear projector in this recipe.

Encoder Projector LLM test
AV-HuBERT Large + Self-Training Linear(~15.74M) vicuna-7b-v1.5 29.47

Data preparation

Follow the steps in preparation of av_hubert to pre-process LRS3 dataset

Environment

Use the specific fairseq version of av_hubert, which is compatible with hydra-core versions below 1.0.7 and omegaconf versions below 2.0.6.

Decode with checkpoints

bash decode_avhubert_vo_vicuna_7b.sh

Modify the path including speech_encoder_path, llm_path, output_dir, ckpt_path and decode_log in the script when you run the shell script.

Train a new model

Use the visual part of AV-HuBERT Large as the encoder

bash finetune_avhubert_vo_vicuna_7b.sh