Releases · neuralmagic/deepsparse

02 Jun 14:38

jeanniefinks

v0.12.2

73ffe09

DeepSparse v0.12.2 Patch Release

This is a patch release for 0.12.0 that contains the following changes:

Protobuf is restricted to version < 4.0 as the newer version breaks ONNX.

Assets 8

05 May 18:25

jeanniefinks

v0.12.1

b030881

DeepSparse v0.12.1 Patch Release

This is a patch release for 0.12.0 that contains the following changes:

Improper label mapping no longer crashes for validation flows within DeepSparse transformers.
DeepSparse Server now exposes proper routes for SageMaker.
Dependency issue with DeepSparse Server no longer installs an old version of a library that caused crashing issues in some use cases.

Assets 8

22 Apr 13:44

jeanniefinks

v0.12.0

01a427a

DeepSparse v0.12.0

New Features:

Documentation:

SparseServer.UI: a Streamlit app for deploying the DeepSparse Server for exploring the inference performance of BERT on the question answering task.
DeepSparse Server README: deepsparse.server capabilities, including single model and multi-model inferencing.
Twitter NLP Inference Examples added.

Changes:

Performance:

Speedup for large batch sizes when using sync mode on AMD EPYC processors.
AVX2 improvements for
- Up to 40% speedup out of the box for dense quantized models.
- Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
Speedup from sparsity realized for ConvInteger operators.
Model compilation time decreased on systems with many cores.
Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

DeepSparse README: references to deepsparse.server, deepsparse.benchmark, and Transformer pipelines.
DeepSparse Benchmark README: highlights of deepsparse.benchmark CLI command.
Transformers 🤗 Inference Pipelines: examples included on how to run inference via Python for several NLP tasks.

Resolved Issues:

When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.

Assets 8

23 Mar 19:57

jeanniefinks

v0.11.2

4bfde08

DeepSparse v0.11.2 Patch Release

This is a patch release for 0.11.0 that contains the following changes:

Fixed an assertion error that would occur when using deepsparse.benchmark on AMD machines with the argument -pin none.

Known Issues:

When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will disable optimizations and see very poor performance.

Assets 8

21 Mar 13:56

jeanniefinks

v0.11.1

d16ca23

DeepSparse v0.11.1 Patch Release

This is a patch release for 0.11.0 that contains the following changes:

When running NanoDet-Plus-m, the DeepSparse Engine will no longer fail with an assertion (See #279).
The DeepSparse Engine now respects the cpu affinity set by the calling thread. This is essential for the new Command-line (CLI) tool multi-process-benchmark.py to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.
Fixed a performance regression on BERT batch size 1 sequence length 128 models.

Assets 8

11 Mar 18:31

jeanniefinks

v0.11.0

46810d4

DeepSparse v0.11.0

New Features:

High-performance sparse quantized convolutional neural networks supported on AVX2 systems.
CCX detection added to the DeepSparse Engine for AMD systems.
deepsparse.server integration and CLIs added with Hugging Face transformers pipelines support.

Changes:

Performance improvements made for

FP32 sparse BERT models
batch size 1 networks
quantized sparse BERT models
Pooling operations

Resolved Issues:

When hyperthreads are disabled in the BIOS, core/socket information on certain systems can now be detected.
Hugging Face transformers validation flows for QQP now giving correct accuracy metrics.
PyTorch downloaded for YOLO model stubs now supported.

Known Issues:

When running NanoDet-Plus-m, the DeepSparse Engine will fail with an assertion (See #279). A hotfix is being pursued.

Assets 8

03 Feb 16:40

jeanniefinks

v0.10.0

b27fbda

DeepSparse v0.10.0

New Features:

Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
NM_SPOOF_ARCH environment variable added for testing different architectural configurations.
Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.
deepsparse.benchmark application is now usable from the command-line after installing deepsparse to simplify benchmarking.
deepsparse.server CLI and API added with transformers support to make serving models like BERT with pipelines easy.

Changes:

More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
Sparse quantized network performance improved on machines that do not support VNNI instructions.
Performance improved for dense BERT with large batch sizes.

Resolved Issues:

Possible crashes eliminated for:
- Pooling operations with small image sizes
- Rarely, networks containing convolution or GEMM operations
- Some models with many residual connections

Known Issues:

None

Assets 8

14 Dec 22:21

jeanniefinks

v0.9.1

ed22c2c

DeepSparse v0.9.1 Patch Release

This is a patch release for 0.9.0 that contains the following changes:

YOLACT models and other models with constant outputs no longer fail with a mismatched shape error on multi-socket systems with batch sizes greater than 1. However, a corner case exists where a model with a constant output whose first dimension is equal to the (nonunit) batch size will fail.
GEMM operations where the number of columns of the output matrix is not divisible by 16 will no longer fail with an assertion error.
Broadcasted inputs to elementwise operators no longer fail with an assertion error.
Int64 multiplications no longer fail with an illegal instruction on AVX2.

Assets 8

01 Dec 16:05

jeanniefinks

v0.9.0

74558ca

DeepSparse v0.9.0

New Features:

Support optimized for resize operators with coordinate transformations of pytorch_half_pixel and align_corners.
Up-to-date version check implemented for DeepSparse.
YOLACT and DeepSparse integration added in examples/dbolya-yolact.

Changes:

The parameter for the number of sockets to use has been removed -- the Python interface now only takes only the number of cores as a parameter.
Tensor columns have been optimized. Users will see performance improvements specifically for pruned quantized BERT models:
- The softmax operator can now take advantage of tensor columns.
- Inference batch sizes that are not divisible by 16 are now supported.
Various performance improvements made to:
- certain networks, such as YOLOv5, on AVX2 systems.
- dense convolutions on some AVX-512 systems.
API docs recompiled.

Resolved Issues:

In rare circumstances, users could have experienced an assertion error when executing networks with depthwise convolutions.

Known Issues:

YOLACT models fail with a mismatched shape error on multi-socket systems with batch size greater than 1. This issue applies to any model with a constant output.
In some circumstances, GEMM operations where the number of columns of the output matrix is not divisible by 16 may fail with an assertion error.

Assets 8

26 Oct 15:03

jeanniefinks

v0.8.0

60a905c

DeepSparse v0.8.0

New Features:

Tensor columns have been optimized, improving the performance of some networks.
- This includes but is not limited to pruned and quantized YOLOv5s and BERT.
- For networks with subgraphs comprised of low-compute operations.
- Batch size must be a multiple of 16.
Reduce operators have been further optimized in the Engine.
C++ API support is available for the DeepSparse Engine.

Changes:

Performance improvements made for low-precision (8 and 16-bit) datatypes on AVX2.

Resolved Issues:

Rarely, when several data arrangement operators were in a row, e.g., Reshape, Transpose, or Slice, assertion errors occurred.
When Pad operators were not followed by convolution or pooling, assertion errors occurred.
CPU threads migrated between cores when running benchmarks.

Known Issues:

None

Assets 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features:

Changes:

Resolved Issues:

Known Issues:

Known Issues:

New Features:

Changes:

Resolved Issues:

Known Issues:

New Features:

Changes:

Resolved Issues:

Known Issues:

New Features:

Changes:

Resolved Issues:

Known Issues:

New Features:

Changes:

Resolved Issues:

Known Issues:

Releases: neuralmagic/deepsparse

DeepSparse v0.12.2 Patch Release

DeepSparse v0.12.1 Patch Release

DeepSparse v0.12.0

New Features:

Changes:

Resolved Issues:

Known Issues:

DeepSparse v0.11.2 Patch Release

Known Issues:

DeepSparse v0.11.1 Patch Release

DeepSparse v0.11.0

New Features:

Changes:

Resolved Issues:

Known Issues:

DeepSparse v0.10.0

New Features:

Changes:

Resolved Issues:

Known Issues:

DeepSparse v0.9.1 Patch Release

DeepSparse v0.9.0

New Features:

Changes:

Resolved Issues:

Known Issues:

DeepSparse v0.8.0

New Features:

Changes:

Resolved Issues:

Known Issues: