You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried running DeepSeek-vl2-tiny on the CPU. But it throws an error.
Does it support CPU inference?
I tried backtracking, and an error is being thrown from the memory_efficient_attention of the "formers" package. When I checked the operator bindings, I saw that the operator class IMPL is only registered for the CUDA dispatch key.
So, how can I run this model on a CPU?
Below is the output:
NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(19, 729, 16, 72) (torch.bfloat16)
key : shape=(19, 729, 16, 72) (torch.bfloat16)
value : shape=(19, 729, 16, 72) (torch.bfloat16)
attn_bias : <class 'NoneType'>
p : 0.0 ckF is not supported because:
device=cpu (supported: {'cuda'})
bf16 is only supported on A100+ GPUs
The text was updated successfully, but these errors were encountered:
I tried running DeepSeek-vl2-tiny on the CPU. But it throws an error.
Does it support CPU inference?
I tried backtracking, and an error is being thrown from the memory_efficient_attention of the "formers" package. When I checked the operator bindings, I saw that the operator class IMPL is only registered for the CUDA dispatch key.
So, how can I run this model on a CPU?
Below is the output:
NotImplementedError: No operator found for
memory_efficient_attention_forward
with inputs:query : shape=(19, 729, 16, 72) (torch.bfloat16)
key : shape=(19, 729, 16, 72) (torch.bfloat16)
value : shape=(19, 729, 16, 72) (torch.bfloat16)
attn_bias : <class 'NoneType'>
p : 0.0
ckF
is not supported because:device=cpu (supported: {'cuda'})
bf16 is only supported on A100+ GPUs
The text was updated successfully, but these errors were encountered: