Is MMQ supposed to speed up prompt processing on a single 3090? #679

tmsingson · 2024-02-13T06:50:39Z

tmsingson
Feb 13, 2024

I noticed that leaving MMQ on drastically speeds up prompt processing on my single 3090 gpu. This confuses me since I'm using K quants and I have a gpu with tensor cores that supposedly should be faster than MMQ kernels. Am I doing something wrong, or is this the expected behavior?

Edit: Not sure if this matters, but I usually only offload 60% of the model layers to my VRAM and the rest to my DDR4 system RAM. This is so I can run larger models with higher BPWs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is MMQ supposed to speed up prompt processing on a single 3090? #679

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Is MMQ supposed to speed up prompt processing on a single 3090? #679

tmsingson Feb 13, 2024

Replies: 0 comments

tmsingson
Feb 13, 2024