-
I got the following warning and error with this LLM: TheBloke/dolphin-2.6-mistral-7B-dpo-laser-GPTQ The other 2 LLMs I tried did not produce this error: TheBloke/NeuralBeagle14-7B-GPTQ, TheBloke/Llama-2-7b-Chat-GPTQ The warning: The exllama kernel for GPTQ requires a float16 input activation, while torch.float32 was passed. Casting to float16. Then it runs into an error: File (linear.py:114), in Linear.forward(self, input) RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != struct c10::BFloat16 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
change line 59 in prompt_compressor.py to: torch_dtype=torch.float16 if device_map == "cuda" else torch.float32, solves the problem. |
Beta Was this translation helpful? Give feedback.
change line 59 in prompt_compressor.py to:
torch_dtype=torch.float16 if device_map == "cuda" else torch.float32,
solves the problem.