You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The behavior of the Multi-stream scheduler is now identical to the Elastic scheduler, and the old Multi-stream scheduler has been removed.
NLP pipelines for question answering, text classification, and token classification upgraded to improve accuracy and better match the SparseML training pathways.
Updates made across the repository for new SparseZoo Python APIs.
Max torchvision version increased to 0.12.0 for computer vision deployment pathways.
Performance:
Inference performance improvements for
unstructured sparse quantized Transformer models.
slow activation functions (such as Gelu or Swish) when they follow a QuantizeLinear operator.
some sparse 1D convolutions. Speedups of up to 3x are observed.
Squeeze, when operating on a single axis.
Resolved Issues:
Assertion errors no longer when one node had multiple inputs, both coming from the same node no longer occurs.
An assertion error no longer appears when a MatMul operator followed a Transpose or Reshape operator no longer occurs.
Pipelines now support hyphenated versions of standard task names such as question-answering,
Known Issues:
In the C++ interface, the engine will crash with a segmentation fault when the num_streams provided to the engine_context_t is greater than the number of physical CPU cores.