Current transformers version v4.40 is not supported #38

maxjeblick · 2024-07-16T14:28:53Z

When trying to run the "Streaming with H2O" example, I observe the error below:
(I also had to change from utils_real_drop.stream import load, download_url, load_jsonl in run_streaming.py).

The latest transformers version that works is 4.33, starting from 4.34, I get issues (4.34 get's this issue).

(venv) root@6770922:/mount/data/kv_press/h2o_heavy_hitter/H2O/h2o_hf# bash ./scripts/streaming/eval.sh h2o
Loading model from lmsys/vicuna-13b-v1.3 ...
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 727/727 [00:00<00:00, 2.87MB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 88.8MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 435/435 [00:00<00:00, 1.55MB/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 567/567 [00:00<00:00, 2.18MB/s]
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
pytorch_model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33.4k/33.4k [00:00<00:00, 62.4MB/s]
pytorch_model-00001-of-00003.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.95G/9.95G [01:20<00:00, 123MB/s]
pytorch_model-00002-of-00003.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.90G/9.90G [01:31<00:00, 108MB/s]
pytorch_model-00003-of-00003.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.18G/6.18G [00:48<00:00, 127MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [03:41<00:00, 73.76s/it]
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
H2OKVCache-LayerWise: 48, 2000
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:11<00:00,  3.81s/it]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 132/132 [00:00<00:00, 585kB/s]
Loading data from data/mt_bench.jsonl ...
Downloading https://raw.githubusercontent.com/lm-sys/FastChat/main/fastchat/llm_judge/data/mt_bench/question.jsonl

USER: Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

ASSISTANT: Traceback (most recent call last):
  File "/mount/data/kv_press/h2o_heavy_hitter/H2O/h2o_hf/run_streaming.py", line 148, in <module>
    main(args)
  File "/mount/data/kv_press/h2o_heavy_hitter/H2O/h2o_hf/run_streaming.py", line 119, in main
    streaming_inference_heavy_hitter(
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/H2O/h2o_hf/run_streaming.py", line 94, in streaming_inference_heavy_hitter
    past_key_values = greedy_generate(
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/H2O/h2o_hf/run_streaming.py", line 21, in greedy_generate
    outputs = model(
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1211, in forward
    outputs = self.model(
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1018, in forward
    layer_outputs = decoder_layer(
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 741, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mount/data/kv_press/h2o_heavy_hitter/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: H2OLlamaAttention_streaming.forward() got an unexpected keyword argument 'cache_position'

The text was updated successfully, but these errors were encountered:

benja-matic mentioned this issue Aug 1, 2024

Fix RotaryEmbedding import in modify_gptneox #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current transformers version v4.40 is not supported #38

Current transformers version v4.40 is not supported #38

maxjeblick commented Jul 16, 2024

Current transformers version v4.40 is not supported #38

Current transformers version v4.40 is not supported #38

Comments

maxjeblick commented Jul 16, 2024