You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running model for an image. Note: I am using a quantized (qint8) version of the model.
> deepseek_vl2.models.modeling_deepseek.apply_rotary_pos_emb : 360 Python deepseek_vl2.models.modeling_deepseek.forward : 886 (Current frame) Python torch.nn.modules.module._call_impl : 1750 Python torch.nn.modules.module._wrapped_call_impl : 1739 Python deepseek_vl2.models.modeling_deepseek.forward : 1298 Python torch.nn.modules.module._call_impl : 1750 Python torch.nn.modules.module._wrapped_call_impl : 1739 Python deepseek_vl2.models.modeling_deepseek.forward : 1585 Python torch.nn.modules.module._call_impl : 1750 Python
When calling def apply_rotary_pos_emb the COS and SIN tensors are empty resulting in an error:
So deepseek changed the LLama original code. These (buffers) apparently are not initialized in this model. How do we get around this error?
After using the chunk_size=512 # prefilling size the error above went away producing a new error:
Then I got a new error message trying to update the version on a torch tensor. 1st problem: Why in inference mode would the model be updating a torch tensor version? The 2nd problem has to do with Quanto not handling the problem internally as the tensor in question was quantized.
Here's the relevant code in def forward (modeling_deepseek.py; line number 898) where I had to make a change.
qclone = self.kv_b_proj.weight.detach().clone() << insert this new code to bypass version update error on quantized tensor
qclone = qclone.to(dtype=self.kv_b_proj.weight.dtype,device=self.kv_b_proj.weight.device) << insert as above
#kv_b_proj = self.kv_b_proj.weight.view(self.num_heads, -1, self.kv_lora_rank) << produces version error on q tensor
kv_b_proj = qclone.view(self.num_heads, -1, self.kv_lora_rank) << refer to new cloned tensor
The text was updated successfully, but these errors were encountered:
Running model for an image. Note: I am using a quantized (qint8) version of the model.
> deepseek_vl2.models.modeling_deepseek.apply_rotary_pos_emb : 360 Python deepseek_vl2.models.modeling_deepseek.forward : 886 (Current frame) Python torch.nn.modules.module._call_impl : 1750 Python torch.nn.modules.module._wrapped_call_impl : 1739 Python deepseek_vl2.models.modeling_deepseek.forward : 1298 Python torch.nn.modules.module._call_impl : 1750 Python torch.nn.modules.module._wrapped_call_impl : 1739 Python deepseek_vl2.models.modeling_deepseek.forward : 1585 Python torch.nn.modules.module._call_impl : 1750 Python
When calling def apply_rotary_pos_emb the COS and SIN tensors are empty resulting in an error:
So deepseek changed the LLama original code. These (buffers) apparently are not initialized in this model. How do we get around this error?
After using the chunk_size=512 # prefilling size the error above went away producing a new error:
Then I got a new error message trying to update the version on a torch tensor. 1st problem: Why in inference mode would the model be updating a torch tensor version? The 2nd problem has to do with Quanto not handling the problem internally as the tensor in question was quantized.
Here's the relevant code in def forward (modeling_deepseek.py; line number 898) where I had to make a change.
The text was updated successfully, but these errors were encountered: