Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance drop (discrepencies) with model converted through the scripts comparing to the official one. #1206

Open
1 of 5 tasks
ZhangPeng4242 opened this issue Feb 23, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@ZhangPeng4242
Copy link

System Info

I am using nextjs 15

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I have been converting LLama-3.2-1b-instruct models through the official scripts that is provided in this repo. I noticed significant performance drop with this conversion compared to the official one provided. All using the same quantization q4f16, and all the rest is the same, this is the answer to a simple question "tell me a joke."

official one - onnx-community/Llama-3.2-1B-Instruct-q4f16

Image

another official conversion? has the same performance drop issue - https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct

Image

my own conversion throgh the scripts

Image Image

How can I solve it? what's wrong in the conversion?

Reproduction

Run the conversion with the scripts using this command: python -m scripts.convert --quantize --model_id meta-llama/Llama-3.2-1B-Instruct. And then upload the model to hugging face, and using this model to generate with q4f16 quantization, the results are different from this one: onnx-community/Llama-3.2-1B-Instruct-q4f16

@ZhangPeng4242 ZhangPeng4242 added the bug Something isn't working label Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant