GPU quantization for sentence-transformer: ONNX quantized model #124

Matthieu-Tinycoaching · 2022-08-09T12:08:14Z

Hi,

Thanks for the nice repo!

Isn't it possible to obtain an ONNX model from GPU quantization of a sentence-transformer?

It seems that the end to end notebook is based on tensorRT quantized model.

Thanks!

pommedeterresautee · 2022-08-09T12:11:26Z

On GPU, TensorRT is the only way to run quantized models.
Onnx Runtime requires you to use the TensorRT provider.

So you need to:
1/ extract onnx model from sentence transformers (wrappers are provided in this lib in case you have some difficulties)
2/ do same kind of work than described in the repo on this onnx file

Matthieu-Tinycoaching · 2022-08-09T13:54:15Z

Do you recommend to apply QAT? Is this suppose to redo pre-training on the whole dataset in case of multilingual model?

pommedeterresautee · 2022-08-09T15:24:54Z

Is this suppose to redo pre-training on the whole dataset in case of multilingual model?
It depends ;-)
What I can say is official Nvidia doc says only 10%, but in CV with large dataset. In fine tuning regime, it definitely depends of the size of your dataset.

And regarding the opportunity to do it, it's more about how important latency improvement matters to your use case and the difficulty to industrialize it Vs other strategies like distillation in miniLM or whatever smaller model for instance. In our case, at some points we excluded it from our industrialization pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU quantization for sentence-transformer: ONNX quantized model #124

GPU quantization for sentence-transformer: ONNX quantized model #124

Matthieu-Tinycoaching commented Aug 9, 2022

pommedeterresautee commented Aug 9, 2022

Matthieu-Tinycoaching commented Aug 9, 2022 •

edited

Loading

pommedeterresautee commented Aug 9, 2022

GPU quantization for sentence-transformer: ONNX quantized model #124

GPU quantization for sentence-transformer: ONNX quantized model #124

Comments

Matthieu-Tinycoaching commented Aug 9, 2022

pommedeterresautee commented Aug 9, 2022

Matthieu-Tinycoaching commented Aug 9, 2022 • edited Loading

pommedeterresautee commented Aug 9, 2022

Matthieu-Tinycoaching commented Aug 9, 2022 •

edited

Loading