Skip to content

Commit

Permalink
[Bugfix] Fix num video tokens calculation for Qwen2-VL (vllm-project#…
Browse files Browse the repository at this point in the history
…13148)

Signed-off-by: DarkLight1337 <[email protected]>
  • Loading branch information
DarkLight1337 authored Feb 12, 2025
1 parent f4d97e4 commit 985b4a2
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion vllm/model_executor/models/qwen2_vl.py
Original file line number Diff line number Diff line change
Expand Up @@ -800,7 +800,11 @@ def _get_vision_info(
preprocessed_size = ImageSize(width=image_width,
height=image_height)

grid_t = max(num_frames // temporal_patch_size, 1)
# NOTE: Frames are padded to be divisible by `temporal_patch_size`
# https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py#L294
padded_num_frames = num_frames + num_frames % temporal_patch_size

grid_t = max(padded_num_frames // temporal_patch_size, 1)
grid_h = preprocessed_size.height // patch_size
grid_w = preprocessed_size.width // patch_size

Expand Down

0 comments on commit 985b4a2

Please sign in to comment.