Skip to content

Official code of *Exploring the Design Space of Visual Context Representation in Video MLLMs*

Notifications You must be signed in to change notification settings

Richar-Du/Opt-Visor

Repository files navigation

image description Exploring the Design Space of Visual Context Representation in Video MLLMs

📰 News

[2024.10.12] Release the inference codes of Opt-Visor.

🛠️ Requirements

  • Python == 3.10.12

  • CUDA Version == 12.4

pip install -r requirements.txt

🌍 Model Zoo

Model Name Visual Encoder Language Decoder # Training Frames Tokens per Frame
Opt-Visor-120frame-49token-Qwen2-7B siglip-so400m-patch14-384 Qwen2-7B 120 49

🤖 Inference

Run the following command to get the response of an instruction:

python inference.py \
       --model_path /path/to/Opt-Visor \
       --gpu_id 0 \
       --video_path /path/to/your/video \
       --question "Please describe the video indetail."

To Do List

  • Release the inference code.
  • Release the model.

📑 Citation


About

Official code of *Exploring the Design Space of Visual Context Representation in Video MLLMs*

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages