-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for gpt2 quantization #52
Comments
Indeed we have not yet done it, but it should be fairly simple. You can call patch_model (https://github.com/ELS-RD/transformer-deploy/blob/main/src/transformer_deploy/QDQModels/patch.py#L44) and for an example of simple module: https://github.com/ELS-RD/transformer-deploy/blob/main/src/transformer_deploy/QDQModels/QDQAlbert.py Let me know if it's clear for you. |
Thank you for your response. I tried to make QDQGPT2.py with the same pattern as QDQBert.py or QDQElectra.py... and added the new patch module into the list in https://github.com/ELS-RD/transformer-deploy/blob/main/src/transformer_deploy/QDQModels/patch.py#L44. But actually I was not able to fully understand how the quantization is working - I got that you insert the QDQ layers but got lost in the code. Anyway, afterward I tried to quantize the GPT2 model, which worked, except that certain layers have amax value 'nan'. e.g.: ` Then I tried to convert the model into onnx and tensorrt, both worked. However, in tensorrt the speed is slower than with fp32 precision. Do you have any idea, why it is so slow? |
Have you build engine with int 8 support? |
Yes, I've set both fp16 and int8 flags
basically I used analogical code to your quantization demo, only the model changed; I can share some of my measurements (in seconds - it is an average over 20 runs, sample is always the same).:
|
have you checked that your local tensorrt version is the same than the docker image you use? |
I am not using any docker image, all is installed either in python virtual environment, conda environment or locally, so there shouldn't be any version missmatch |
I tried to quantize (add QDQ layers) the gpt2 model:
but no QDQ layers were inserted - I assume that you don't support GPT2 yet. Do you plan add it?
The text was updated successfully, but these errors were encountered: