Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not work on CPU #41

Open
ghost opened this issue Jan 25, 2025 · 3 comments
Open

Does not work on CPU #41

ghost opened this issue Jan 25, 2025 · 3 comments

Comments

@ghost
Copy link

ghost commented Jan 25, 2025

I tried running DeepSeek-vl2-tiny on the CPU. But it throws an error.

Does it support CPU inference?

I tried backtracking, and an error is being thrown from the memory_efficient_attention of the "formers" package. When I checked the operator bindings, I saw that the operator class IMPL is only registered for the CUDA dispatch key.

So, how can I run this model on a CPU?

Below is the output:

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(19, 729, 16, 72) (torch.bfloat16)
key : shape=(19, 729, 16, 72) (torch.bfloat16)
value : shape=(19, 729, 16, 72) (torch.bfloat16)
attn_bias : <class 'NoneType'>
p : 0.0
ckF is not supported because:
device=cpu (supported: {'cuda'})
bf16 is only supported on A100+ GPUs

@JosTheBossX
Copy link

Did you solve this?

@pappukrs
Copy link

Had you solved?

@saart
Copy link

saart commented Feb 1, 2025

I opened a PR to solve it: #48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants