add GPU quantization support
pommedeterresautee
released this
08 Dec 22:46
·
161 commits
to main
since this release
- support int-8 GPU quantization
- add a tuto to perform quantization end to end
- add
QDQRoberta
model - switch to ONNX opset 13
- refactoring in the TensorRT engine creation
- fix bugs
- add auth token (for private HF repo)
What's Changed
- Update triton by @pommedeterresautee in #11
- fix README.md by @pommedeterresautee in #13
- Fix install errors by @sam-writer in #20
- Add auth token by @sam-writer in #19
- Support GPU INT-8 quantization by @pommedeterresautee in #15
New Contributors
- @sam-writer made their first contribution in #20
Full Changelog: v0.1.1...v0.2.0