Releases
v0.7.0
New Features:
Operators optimized for Engine support:
Where*
Cast*
IntegerMatMul*
QLinearMatMul*
Gather (for scalar indices)
*optimized only for AVX-512 support
Flag created to disable any batch size overrides, setting the environment variable "NM_DISABLE_BATCH_OVERRIDE=1".
Warnings display when emulating quantized operations on machines without VNNI instructions.
Support added for Python 3.9.
Support added for ONNX versions 1.8 - 1.10.
Changes:
Performance improvements made for sparse quantized transformer models.
Documentation updates made for examples/ultralytics-yolo to include YOLOv5.
Resolved Issues:
A crash could result with an uninitialized memory read. A check is now in place before trying to access it.
Engine output_shape functions corrected on multi-socket systems when the output dimensions are not statically known.
Known Issues:
BERT models with quantized embeds currently segfault on AVX2 machines. Workaround is to run on a VNNI-compatible machine.
You can’t perform that action at this time.