inference optimization ⚗
🦿 this release adds support for some features that can make inference faster:
- support for torch compile & optimum onnx1
- improved the
textsum-dir
command, more options/streamline etc, addedfire
package to help with that- the saved config
JSON
files are now better structured to keep track of parameters, etc
- the saved config
- some small adjustments to the
Summarizer
class
Next up: the UI app will finally get an overhaul.
-
please note that Support for is not an equivalent statement to "I have tested every longctx model with ONNX max quantization and sign off guaranteeing they will all provide accurate results". I've had some good results, but also some strange ones (with Long-T5 specifically). Test beforehand, and file an issue on the Optimum repo as needed 🙏 ↩