Skip to content

Releases: neuralmagic/deepsparse

DeepSparse v0.12.2 Patch Release

02 Jun 14:38
73ffe09
Compare
Choose a tag to compare

This is a patch release for 0.12.0 that contains the following changes:

  • Protobuf is restricted to version < 4.0 as the newer version breaks ONNX.

DeepSparse v0.12.1 Patch Release

05 May 18:25
b030881
Compare
Choose a tag to compare

This is a patch release for 0.12.0 that contains the following changes:

  • Improper label mapping no longer crashes for validation flows within DeepSparse transformers.
  • DeepSparse Server now exposes proper routes for SageMaker.
  • Dependency issue with DeepSparse Server no longer installs an old version of a library that caused crashing issues in some use cases.

DeepSparse v0.12.0

22 Apr 13:44
01a427a
Compare
Choose a tag to compare

New Features:

Documentation:

Changes:

Performance:

  • Speedup for large batch sizes when using sync mode on AMD EPYC processors.
  • AVX2 improvements for
    • Up to 40% speedup out of the box for dense quantized models.
    • Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
  • Speedup from sparsity realized for ConvInteger operators.
  • Model compilation time decreased on systems with many cores.
  • Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
  • Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

Resolved Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
  • Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

  • When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.

DeepSparse v0.11.2 Patch Release

23 Mar 19:57
4bfde08
Compare
Choose a tag to compare

This is a patch release for 0.11.0 that contains the following changes:

  • Fixed an assertion error that would occur when using deepsparse.benchmark on AMD machines with the argument -pin none.

Known Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will disable optimizations and see very poor performance.

DeepSparse v0.11.1 Patch Release

21 Mar 13:56
d16ca23
Compare
Choose a tag to compare

This is a patch release for 0.11.0 that contains the following changes:

  • When running NanoDet-Plus-m, the DeepSparse Engine will no longer fail with an assertion (See #279).
  • The DeepSparse Engine now respects the cpu affinity set by the calling thread. This is essential for the new Command-line (CLI) tool multi-process-benchmark.py to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.
  • Fixed a performance regression on BERT batch size 1 sequence length 128 models.

DeepSparse v0.11.0

11 Mar 18:31
46810d4
Compare
Choose a tag to compare

New Features:

  • High-performance sparse quantized convolutional neural networks supported on AVX2 systems.
  • CCX detection added to the DeepSparse Engine for AMD systems.
  • deepsparse.server integration and CLIs added with Hugging Face transformers pipelines support.

Changes:

Performance improvements made for

  • FP32 sparse BERT models
  • batch size 1 networks
  • quantized sparse BERT models
  • Pooling operations

Resolved Issues:

  • When hyperthreads are disabled in the BIOS, core/socket information on certain systems can now be detected.
  • Hugging Face transformers validation flows for QQP now giving correct accuracy metrics.
  • PyTorch downloaded for YOLO model stubs now supported.

Known Issues:

  • When running NanoDet-Plus-m, the DeepSparse Engine will fail with an assertion (See #279). A hotfix is being pursued.

DeepSparse v0.10.0

03 Feb 16:40
b27fbda
Compare
Choose a tag to compare

New Features:

  • Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
  • NM_SPOOF_ARCH environment variable added for testing different architectural configurations.
  • Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.
  • deepsparse.benchmark application is now usable from the command-line after installing deepsparse to simplify benchmarking.
  • deepsparse.server CLI and API added with transformers support to make serving models like BERT with pipelines easy.

Changes:

  • More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
  • Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
  • Sparse quantized network performance improved on machines that do not support VNNI instructions.
  • Performance improved for dense BERT with large batch sizes.

Resolved Issues:

  • Possible crashes eliminated for:
    • Pooling operations with small image sizes
    • Rarely, networks containing convolution or GEMM operations
    • Some models with many residual connections

Known Issues:

  • None

DeepSparse v0.9.1 Patch Release

14 Dec 22:21
ed22c2c
Compare
Choose a tag to compare

This is a patch release for 0.9.0 that contains the following changes:

  1. YOLACT models and other models with constant outputs no longer fail with a mismatched shape error on multi-socket systems with batch sizes greater than 1. However, a corner case exists where a model with a constant output whose first dimension is equal to the (nonunit) batch size will fail.
  2. GEMM operations where the number of columns of the output matrix is not divisible by 16 will no longer fail with an assertion error.
  3. Broadcasted inputs to elementwise operators no longer fail with an assertion error.
  4. Int64 multiplications no longer fail with an illegal instruction on AVX2.

DeepSparse v0.9.0

01 Dec 16:05
74558ca
Compare
Choose a tag to compare

New Features:

  • Support optimized for resize operators with coordinate transformations of pytorch_half_pixel and align_corners.
  • Up-to-date version check implemented for DeepSparse.
  • YOLACT and DeepSparse integration added in examples/dbolya-yolact.

Changes:

  • The parameter for the number of sockets to use has been removed -- the Python interface now only takes only the number of cores as a parameter.
  • Tensor columns have been optimized. Users will see performance improvements specifically for pruned quantized BERT models:
    • The softmax operator can now take advantage of tensor columns.
    • Inference batch sizes that are not divisible by 16 are now supported.
  • Various performance improvements made to:
    • certain networks, such as YOLOv5, on AVX2 systems.
    • dense convolutions on some AVX-512 systems.
  • API docs recompiled.

Resolved Issues:

  • In rare circumstances, users could have experienced an assertion error when executing networks with depthwise convolutions.

Known Issues:

  • YOLACT models fail with a mismatched shape error on multi-socket systems with batch size greater than 1. This issue applies to any model with a constant output.
  • In some circumstances, GEMM operations where the number of columns of the output matrix is not divisible by 16 may fail with an assertion error.

DeepSparse v0.8.0

26 Oct 15:03
60a905c
Compare
Choose a tag to compare

New Features:

  • Tensor columns have been optimized, improving the performance of some networks.
    • This includes but is not limited to pruned and quantized YOLOv5s and BERT.
    • For networks with subgraphs comprised of low-compute operations.
    • Batch size must be a multiple of 16.
  • Reduce operators have been further optimized in the Engine.
  • C++ API support is available for the DeepSparse Engine.

Changes:

  • Performance improvements made for low-precision (8 and 16-bit) datatypes on AVX2.

Resolved Issues:

  • Rarely, when several data arrangement operators were in a row, e.g., Reshape, Transpose, or Slice, assertion errors occurred.
  • When Pad operators were not followed by convolution or pooling, assertion errors occurred.
  • CPU threads migrated between cores when running benchmarks.

Known Issues:

  • None