Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running NVILA-8B-Video #190

Open
yuanyehome opened this issue Jan 16, 2025 · 0 comments
Open

Error when running NVILA-8B-Video #190

yuanyehome opened this issue Jan 16, 2025 · 0 comments

Comments

@yuanyehome
Copy link

When I evaluated NVILA-8B-Video on lmms-longvideobench with this script:

#!/bin/bash
set -e

MODEL_NAMES=(
    "NVILA-8B-Video"
)
SELECTED_TASKS=(
    "lmms-longvideobench_val_v"
)
TASK_STR=$(
    IFS=,
    echo "${SELECTED_TASKS[*]}"
)
echo "TASK_STR: $TASK_STR"

START_TIME=$(date +%s)
echo "START_TIME: $(date -d @"$START_TIME")"

for MODEL_NAME in "${MODEL_NAMES[@]}"; do
    MODEL_ID="../my-models/models/$MODEL_NAME"
    vila-eval \
        --model-name "$MODEL_NAME" \
        --model-path "$MODEL_ID" \
        --conv-mode auto \
        --tags-include local \
        --nproc-per-node 2 \
        --tasks "$TASK_STR" \
        --output-dir "./runs/run-eval-20250112/$MODEL_NAME"
done

END_TIME=$(date +%s)
echo "END_TIME: $(date -d @"$END_TIME")"
echo "TIME_TAKEN: $((END_TIME - START_TIME)) seconds"

I've encountered this error:

2025-01-16 23:23:30.871 | WARNING  | llava.utils.media:_load_video:59 - Failed to read frame 8014 from video '/data/yy/.cache/huggingface/longvideobench/videos/BktEeBeA7a8.mp4'. Skipped.
2025-01-16 23:23:30.875 | WARNING  | llava.utils.media:_load_video:59 - Failed to read frame 8014 from video '/data/yy/.cache/huggingface/longvideobench/videos/BktEeBeA7a8.mp4'. Skipped.
Traceback (most recent call last):
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/lmms_eval/__main__.py", line 329, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/lmms_eval/__main__.py", line 470, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/lmms_eval/utils.py", line 533, in _wrapper
    return fn(*args, **kwargs)
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/lmms_eval/evaluator.py", line 243, in simple_evaluate
    results = evaluate(
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/lmms_eval/utils.py", line 533, in _wrapper
    return fn(*args, **kwargs)
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/lmms_eval/evaluator.py", line 457, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
  File "/data/yy/anker/nvila/VILA/llava/eval/lmms/models/vila_internal.py", line 106, in generate_until
    response = self.model.generate_content(prompt, generation_config=generation_config)
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/yy/anker/nvila/VILA/llava/model/llava_arch.py", line 834, in generate_content
    output_ids = self.generate(
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/yy/anker/nvila/VILA/llava/model/llava_arch.py", line 783, in generate
    inputs_embeds, _, attention_mask = self._embed(input_ids, media, media_config, None, attention_mask)
  File "/data/yy/anker/nvila/VILA/llava/model/llava_arch.py", line 415, in _embed
    media_embeds = self.__embed_media_tokens(media, media_config)
  File "/data/yy/anker/nvila/VILA/llava/model/llava_arch.py", line 488, in __embed_media_tokens
    embeds[name] = deque(self.encoders[name](media[name], media_config[name]))
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/yy/anaconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/yy/anker/nvila/VILA/llava/model/encoders/video/tsp.py", line 64, in forward
    return [process_features(f) for f in features]
  File "/data/yy/anker/nvila/VILA/llava/model/encoders/video/tsp.py", line 64, in <listcomp>
    return [process_features(f) for f in features]
  File "/data/yy/anker/nvila/VILA/llava/model/encoders/video/tsp.py", line 41, in _process_features
    features = pool(features, p, dim=dim)
  File "/data/yy/anker/nvila/VILA/llava/model/encoders/video/tsp.py", line 12, in pool
    return x.view(x.shape[:dim] + (-1, size) + x.shape[dim + 1 :]).mean(dim + 1)
RuntimeError: shape '[-1, 8, 16, 16, 3584]' is invalid for input of size 6422528
2025-01-16 23:23:31.233 | ERROR    | __main__:cli_evaluate:348 - Error during evaluation: shape '[-1, 8, 16, 16, 3584]' is invalid for input of size 6422528. Please set `--verbosity=DEBUG` to get more information.

What is this TSPVideoEncoder? And how to avoid this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant