-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefix Tuning dimension error with Qwen2 and missing vocab_size for PaliGemma2 #2315
Comments
@BenjaminBossan Is it okay if I work on this? |
@marzi9696 Sure, you can give it a shot. I'll probably look into this tomorrow if there is no solution by then. |
Update from my side: |
@Florian-Dreyer I was trying to run your jupyter notebook but I could not actually run the data processing part. Also you mentioned a "if_statement in train.py". I was reading the source code yesterday and I notice the if-statement as well. was it this block of the code: |
Do you mean the csv file for the _gemini dfs? The file is the other file on the GitHub repo I shared. No, it was a different one, this one should work after modifying the config of the model. The if statement I meant compared the current vocab_size of the base model with the vocab_size of the base model, but the original one, newly loaded, that’s why it doesnt have the vocab_size attribute in its config. |
System Info
PEFT: 0.14.0
Transformers: 4.48.0.dev0
Who can help?
@BenjaminBossan
Information
Tasks
examples
folderReproduction
For Qwen we get the following error:
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/{user_name}/venv/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 84, in _worker
output = module(*input, **kwargs)
File "/home/{user_name}/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/{user_name}/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/{user_name}/venv/lib/python3.10/site-packages/peft/peft_model.py", line 1755, in forward
return self.base_model(input_ids=input_ids, inputs_embeds=inputs_embeds, **kwargs)
File "/home/{user_name}/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/{user_name}/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/{user_name}/venv/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1682, in forward
position_ids, rope_deltas = self.get_rope_index(
File "/home/{user_name}/venv/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1486, in get_rope_index
input_ids = input_ids[attention_mask[i] == 1]
IndexError: The shape of the mask [172] at index 0 does not match the shape of the indexed tensor [122] at index 0
And for PaliGemma2 this one:
AttributeError Traceback (most recent call last)
Cell In[68], line 8
6 tokenizer = processor.tokenizer
7 # Apply PEFT model adaptation
----> 8 peft_model = get_peft_model(model, peft_config)
10 # Print trainable parameters
11 peft_model.print_trainable_parameters()
File ~/venv/lib/python3.10/site-packages/peft/mapping.py:222, in get_peft_model(model, peft_config, adapter_name, mixed, autocast_adapter_dtype, revision, low_cpu_mem_usage)
220 if peft_config.is_prompt_learning:
221 peft_config = _prepare_prompt_learning_config(peft_config, model_config)
--> 222 return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](
223 model,
224 peft_config,
225 adapter_name=adapter_name,
226 autocast_adapter_dtype=autocast_adapter_dtype,
227 low_cpu_mem_usage=low_cpu_mem_usage,
228 )
File ~/venv/lib/python3.10/site-packages/peft/peft_model.py:1684, in PeftModelForCausalLM.init(self, model, peft_config, adapter_name, **kwargs)
1681 def init(
1682 self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs
1683 ) -> None:
-> 1684 super().init(model, peft_config, adapter_name, **kwargs)
1685 self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation
File ~/venv/lib/python3.10/site-packages/peft/peft_model.py:170, in PeftModel.init(self, model, peft_config, adapter_name, autocast_adapter_dtype, low_cpu_mem_usage)
168 self._peft_config = {adapter_name: peft_config}
169 self.base_model = model
--> 170 self.add_adapter(adapter_name, peft_config, low_cpu_mem_usage=low_cpu_mem_usage)
171 else:
172 self._peft_config = None
File ~/venv/lib/python3.10/site-packages/peft/peft_model.py:958, in PeftModel.add_adapter(self, adapter_name, peft_config, low_cpu_mem_usage)
955 dict_config = self.config
957 peft_config = _prepare_prompt_learning_config(peft_config, dict_config)
--> 958 self._setup_prompt_encoder(adapter_name)
959 elif peft_config.is_adaption_prompt:
960 self.base_model.add_adapter(adapter_name, peft_config)
File ~/venv/lib/python3.10/site-packages/peft/peft_model.py:642, in PeftModel._setup_prompt_encoder(self, adapter_name)
635 for named_param, value in list(transformer_backbone.named_parameters()):
636 # for ZeRO-3, the tensor is sharded across accelerators and deepspeed modifies it to a tensor with shape
637 # [0] the actual unsharded shape is stored in "ds_shape" attribute special handling is needed in case
638 # the model is initialized in deepspeed.zero.Init() context or HfDeepSpeedConfig has been called before
639 # For reference refer to issue: #996
640 deepspeed_distributed_tensor_shape = getattr(value, "ds_shape", None)
--> 642 if value.shape[0] == self.base_model.config.vocab_size or (
643 deepspeed_distributed_tensor_shape is not None
644 and deepspeed_distributed_tensor_shape[0] == self.base_model.config.vocab_size
645 ):
646 word_embeddings = transformer_backbone.get_submodule(named_param.replace(".weight", ""))
647 break
File ~/venv/lib/python3.10/site-packages/transformers/configuration_utils.py:211, in PretrainedConfig.getattribute(self, key)
209 if key != "attribute_map" and key in super().getattribute("attribute_map"):
210 key = super().getattribute("attribute_map")[key]
--> 211 return super().getattribute(key)
AttributeError: 'PaliGemmaConfig' object has no attribute 'vocab_size'
You can find the notebook here to replicate the errors here:
https://github.com/Florian-Dreyer/PEFT_BUG/blob/main/prefix_tuning_peft.ipynb
Just execute the cells to get the errors.
Expected behavior
We would expect the models to be able to process the input. We tried just calling model(**inputs) but ran into the same error with Qwen. Note: The dimension difference is exactly the prefix length.
So the question is, how can we get the models to run? Is PaliGemma even supported?
The text was updated successfully, but these errors were encountered: