Does not work on CPU #41

ghost · 2025-01-25T08:14:02Z

I tried running DeepSeek-vl2-tiny on the CPU. But it throws an error.

Does it support CPU inference?

I tried backtracking, and an error is being thrown from the memory_efficient_attention of the "formers" package. When I checked the operator bindings, I saw that the operator class IMPL is only registered for the CUDA dispatch key.

So, how can I run this model on a CPU?

Below is the output:

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(19, 729, 16, 72) (torch.bfloat16)
key : shape=(19, 729, 16, 72) (torch.bfloat16)
value : shape=(19, 729, 16, 72) (torch.bfloat16)
attn_bias : <class 'NoneType'>
p : 0.0
ckF is not supported because:
device=cpu (supported: {'cuda'})
bf16 is only supported on A100+ GPUs

The text was updated successfully, but these errors were encountered:

JosTheBossX · 2025-01-27T19:47:00Z

Did you solve this?

pappukrs · 2025-01-30T03:11:53Z

Had you solved?

saart · 2025-02-01T20:32:08Z

I opened a PR to solve it: #48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does not work on CPU #41

Does not work on CPU #41

ghost commented Jan 25, 2025

JosTheBossX commented Jan 27, 2025

pappukrs commented Jan 30, 2025

saart commented Feb 1, 2025

Does not work on CPU #41

Does not work on CPU #41

Comments

ghost commented Jan 25, 2025

JosTheBossX commented Jan 27, 2025

pappukrs commented Jan 30, 2025

saart commented Feb 1, 2025