You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been converting LLama-3.2-1b-instruct models through the official scripts that is provided in this repo. I noticed significant performance drop with this conversion compared to the official one provided. All using the same quantization q4f16, and all the rest is the same, this is the answer to a simple question "tell me a joke."
official one - onnx-community/Llama-3.2-1B-Instruct-q4f16
How can I solve it? what's wrong in the conversion?
Reproduction
Run the conversion with the scripts using this command: python -m scripts.convert --quantize --model_id meta-llama/Llama-3.2-1B-Instruct. And then upload the model to hugging face, and using this model to generate with q4f16 quantization, the results are different from this one: onnx-community/Llama-3.2-1B-Instruct-q4f16
The text was updated successfully, but these errors were encountered:
System Info
I am using nextjs 15
Environment/Platform
Description
I have been converting LLama-3.2-1b-instruct models through the official scripts that is provided in this repo. I noticed significant performance drop with this conversion compared to the official one provided. All using the same quantization q4f16, and all the rest is the same, this is the answer to a simple question "tell me a joke."
official one - onnx-community/Llama-3.2-1B-Instruct-q4f16
another official conversion? has the same performance drop issue - https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct
my own conversion throgh the scripts
How can I solve it? what's wrong in the conversion?
Reproduction
Run the conversion with the scripts using this command: python -m scripts.convert --quantize --model_id meta-llama/Llama-3.2-1B-Instruct. And then upload the model to hugging face, and using this model to generate with q4f16 quantization, the results are different from this one: onnx-community/Llama-3.2-1B-Instruct-q4f16
The text was updated successfully, but these errors were encountered: