Skip to content

Releases: neuralmagic/deepsparse

DeepSparse v0.7.0

13 Sep 16:45
c325595
Compare
Choose a tag to compare

New Features:

  • Operators optimized for Engine support:
    • Where*
    • Cast*
    • IntegerMatMul*
    • QLinearMatMul*
    • Gather (for scalar indices)
      *optimized only for AVX-512 support
  • Flag created to disable any batch size overrides, setting the environment variable "NM_DISABLE_BATCH_OVERRIDE=1".
  • Warnings display when emulating quantized operations on machines without VNNI instructions.
  • Support added for Python 3.9.
  • Support added for ONNX versions 1.8 - 1.10.

Changes:

  • Performance improvements made for sparse quantized transformer models.
  • Documentation updates made for examples/ultralytics-yolo to include YOLOv5.

Resolved Issues:

  • A crash could result with an uninitialized memory read. A check is now in place before trying to access it.
  • Engine output_shape functions corrected on multi-socket systems when the output dimensions are not statically known.

Known Issues:

  • BERT models with quantized embeds currently segfault on AVX2 machines. Workaround is to run on a VNNI-compatible machine.

DeepSparse v0.6.1 Patch Release

11 Aug 16:15
2b605ef
Compare
Choose a tag to compare

This is a patch release for 0.6.0 that contains the following changes:

Users no longer experience crashes

  • when running the ReduceSum operation in the DeepSparse Engine.
  • when running operations on tensors that are 8- or 16-bit integers, or booleans, on AVX2.

DeepSparse v0.6.0

30 Jul 23:03
08a210c
Compare
Choose a tag to compare

New Features:

Changes:

  • Performance improvements made for:
    - all networks when running on multi-socket machines, especially those with large outputs.
    - batched Softmax and Reduce operators with many threads available.
    - Reshape operators when multiple dimensions are combined into one or one dimension is split into multiple.
    - stacked matrix multiplications by supporting more input layouts.
  • YOLOv3 example integration was generalized to ultralytics-yolo in support of both V3 and V5.

Resolved Issues:

  • Engine now runs on architectures with more than one NUMA node per socket.

Known Issues:

  • None

DeepSparse v0.5.1 Patch Release

30 Jun 16:23
8e0242b
Compare
Choose a tag to compare

This is a patch release for 0.5.0 that contains the following changes:

  • resolution to address an issue that caused a performance regression on YOLOv5 and could have affected the correctness of some models.

DeepSparse v0.5.0

28 Jun 18:01
ec3c3f4
Compare
Choose a tag to compare

New Features:

  • None

Changes:

  • Performance optimizations implemented for binary elementwise operations, where both inputs come from the same source buffer. One of the inputs may have intermediate unary operations.
  • Performance optimizations implemented for binary elementwise operations where one of the inputs is a constant scalar.
  • Small performance improvement for large batch sizes (> 64) on quantized ResNet.

Resolved Issues:

  • Assertion deepsparse num_sockets removed when too many sockets were requested, causing users to experience a crash.
  • Rare assertion failure fixed when a nonlinearity appeared between an elementwise addition and a convolution or gemm.
  • Broken URLs for classification and detection examples updated in the contained READMEs.

Known Issues:

  • None

DeepSparse v0.4.0

04 Jun 20:52
16c7915
Compare
Choose a tag to compare

New Features:

  • New operator support implemented for Expand.
  • Slice operator support for positive step sizes. Only slice operations that operate on a single axis are supported. Previously, slice was only supported for constant tensors and step size equal to one.

Changes:

  • Memory usage of compiled models reduced.
  • Memory layout for matrix multiplications in Transformers optimized.
  • Precision for swish and sigmoid operations improved.
  • Runtime performance improved for some networks whose outputs are immediately preceded by transpose operators.
  • Runtime performance of softmax operations improved.
  • Readme redesigned for better clarity on the repository's purpose.

Resolved Issues:

  • Using the multi-stream scheduler, when more threads were selected than the number of cores on the system, it no longer causes a performance hit.
  • Neural Magic dependencies upgrade to intended bug versions instead of minor versions.

Known Issues:

  • None

DeepSparse v0.3.1 Patch Release

14 May 00:02
7ea8298
Compare
Choose a tag to compare

This is a patch release for 0.3.0 that contains the following changes:

  • Docs updated for new Discourse and Slack links
  • Check added for supported Python version so DeepSparse does not improperly install on unsupported systems

DeepSparse v0.3.0

30 Apr 23:54
54c7027
Compare
Choose a tag to compare

New Features:

  • Multi-stream scheduler added as a configurable option to the engine.

Changes:

  • Errors related to setting the NUMA memory policy are now issued as warnings.
  • Improved compilation times for sparse networks.
  • Performance improvements made for: networks with large outputs and multi-socket machines; ResNet-50 v1 quantized and kernel sparsity gemms.
  • Copy operations and placement of quantization operations within network optimized.
  • Version changed to be loaded from version.py file, default build on branches is now nightly.
  • cpu.py file and related APIs added to DeepSparse repo instead of copying over from backend.
  • Add unsupported system install errors for end users when running on non-Linux systems.
  • YOLOv3 batch 64 quantized now has a speedup of 16% in the DeepSparse Engine.

Resolved Issues:

  • An assertion is no longer triggered when more sockets or threads than available are requested.
  • Resolved assertion when performing Concat operations on constant buffers.
  • Engine no longer crashes when the output of a QLinearMatMul operation has a dimension not divisible by 4.
  • The engine now starts without crashing on Windows Subsystem for Linux and Docker for Windows or Docker for Mac.

Known Issues:

  • None

DeepSparse v0.2.0

31 Mar 23:11
1852e90
Compare
Choose a tag to compare

New Features:

  • None

Changes:

  • Dense convolutions on AVX2 systems were optimized, improving performance for many non-pruned networks. In particular, this results in a speed improvement for batch size 64 ResNet-50 of up to 28% on Intel AVX2 systems and up to 39% on AMD AVX2 systems.
  • Operations to shuffle activations in engine optimized, resulting in up to 14% speed improvement for batch size 64 pruned quantized MobileNetV1.
  • Performance improvements made for networks with large output arrays.

Resolved Issues:

  • Engine no longer fails with an assert when running some quantized networks.
  • Some Resize operators were not optimized if they had a ROI input.
  • Memory leak addressed on multi-socket systems when batch size > 1.
  • Docs and readme corrections made for minor issues and broken links.
  • Makefile no longer deletes files for docs compilation and cleaning.

Known Issues:

  • In rare cases where a tensor, used as the input or output to an operation, is larger than 2GB, the engine can segfault. Users should decrease the batch size as a workaround.

  • In some cases, models running complicated pre- or post-processing steps could diminish the DeepSparse Engine performance by up to a factor of 10x due to hyperthreading, as two engine threads can run on the same physical core. Address the performance issue by trying the following recommended solutions in order of preference:

    1. Enable thread binding

    If that does not give performance benefit or you want to try additional options:

    1. Use the numactl utility to prevent the process from running on hyperthreads.

    2. Manually set the thread affinity in Python as follows:

    import os
    from deepsparse.cpu import cpu_architecture
    ARCH = cpu_architecture()
    
    if ARCH.vendor == "GenuineIntel":
        os.sched_setaffinity(0, range(ARCH.num_physical_cores()))
    elif ARCH.vendor == "AuthenticAMD":
        os.sched_setaffinity(0, range(0, 2*ARCH.num_physical_cores(), 2))
    else:
        raise RuntimeError(f"Unknown CPU vendor {ARCH.vendor}")
    

DeepSparse v0.1.1 Patch Release

01 Mar 19:56
4940121
Compare
Choose a tag to compare

This is a patch release for 0.1.0 that contains the following changes:

  • Docs updates: tagline, overview, update to use sparsification for verbiage
  • Examples updated to use new ResNet-50 pruned_quant moderate model from the SparseZoo
  • Nightly build dependencies now match on major.minor and not full version
  • Benchmarking script added for reproducing ResNet-50 numbers
  • Small (3-5%) performance improvement for pruned quantized ResNet-50 models, for batch size greater than 16
  • Reduced memory footprint for networks with sparse fully connected layers
  • Improved performance on multi-socket systems when batch size is larger than 1