Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert model.bin (fp32) to model.bin (int8) #1761

Open
aryan1165 opened this issue Aug 24, 2024 · 4 comments
Open

Convert model.bin (fp32) to model.bin (int8) #1761

aryan1165 opened this issue Aug 24, 2024 · 4 comments

Comments

@aryan1165
Copy link

I have a pretrained model.bin file which was earlier converted using OpenNMTPy converter using fp32 quantisation. Now i want to reduce the size of the model and thought of quantising it to int8. But, I was only able to find ways to quantise it upon loading of the model, not able to find how to save it for further use.

Any idea how it can be achieved?

@minhthuc2502
Copy link
Collaborator

The model.bin is designed quite strictly in storing the weights and their sizes continuously. Actually, there isn't any solution to save the weight quantized while loading the model.

@aryan1165
Copy link
Author

But there is a way while converting the model from lets say openNMTPy, to quantize the model and save it. I require this, to decrease the size of the model.bin file. Isn't it possible to develop an api which will quantise the model and save it as a new model.bin file. It seems logical to me as it is available everywhere where quantisation is supported.

@aryan1165
Copy link
Author

I think it can easily be implemented using existing code, but i am not able to figure out how to get model_spec of a current model.bin Once it is there, then the converter code can be modified to save the quantised model again.

@minhthuc2502
Copy link
Collaborator

minhthuc2502 commented Aug 27, 2024

It could develop a new feature where we can save tensor quantized in new model.bin but it isn't simple (require new converter to load the model from model.bin for a spec). Currently, we don't have plan to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants