Running into an issue: The exllama kernel for GPTQ requires a float16 input activation, while torch.float32 was passed #77

samvanity · 2024-01-24T22:59:37Z

samvanity
Jan 24, 2024

I got the following warning and error with this LLM: TheBloke/dolphin-2.6-mistral-7B-dpo-laser-GPTQ

The other 2 LLMs I tried did not produce this error: TheBloke/NeuralBeagle14-7B-GPTQ, TheBloke/Llama-2-7b-Chat-GPTQ

The warning:

The exllama kernel for GPTQ requires a float16 input activation, while torch.float32 was passed. Casting to float16.
Make sure you loaded your model with torch_dtype=torch.float16, that the model definition does not inadvertently cast to float32, or disable AMP Autocast that may produce float32 intermediate activations in the model.

Then it runs into an error:

File (linear.py:114), in Linear.forward(self, input)

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != struct c10::BFloat16

Answered by samvanity

Jan 25, 2024

change line 59 in prompt_compressor.py to:

torch_dtype=torch.float16 if device_map == "cuda" else torch.float32,

solves the problem.

View full answer

samvanity · 2024-01-25T05:28:30Z

samvanity
Jan 25, 2024
Author

change line 59 in prompt_compressor.py to:

torch_dtype=torch.float16 if device_map == "cuda" else torch.float32,

solves the problem.

1 reply

iofu728 Jan 25, 2024
Maintainer

Thank you for your help. I'll attempt to resolve this issue within the library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running into an issue: The exllama kernel for GPTQ requires a float16 input activation, while torch.float32 was passed #77

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Running into an issue: The exllama kernel for GPTQ requires a float16 input activation, while torch.float32 was passed #77

samvanity Jan 24, 2024

Replies: 1 comment · 1 reply

samvanity Jan 25, 2024 Author

iofu728 Jan 25, 2024 Maintainer

samvanity
Jan 24, 2024

Replies: 1 comment 1 reply

samvanity
Jan 25, 2024
Author

iofu728 Jan 25, 2024
Maintainer