Releases: NVIDIA/TensorRT
23.08
What's Changed
- Fix python bindings build and README
- Add kNATIVE_INSTANCENORM flag to demoDiffusion
- Update demoDiffusion to support torch 2.x and fix typo in README
- Add HuggingFace Stable Diffusion pipeline demo
- Upgrade pytorch-quantization to 2.1.3
Full Changelog: v8.6.1...23.08
TensorRT OSS v8.6.1
TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.
- Updates since TensorRT 8.6.0 EA release.
- Please refer to the TensorRT 8.6.1.6 GA release notes for more information.
Key Features and Updates:
- Added a new flag
--use-cuda-graph
to demoDiffusion to improve performance. - Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.
TensorRT OSS v8.6.0
TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.
- Updates since TensorRT 8.5.3 GA release.
- Please refer to the TensorRT 8.6.0.12 EA release notes for more information.
Key Features and Updates:
- demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
- The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
- Added a new sample called onnx_custom_plugin.
We needed to force-push main
and release/8.6
branches and v8.6.0 release. If you cloned/pulled the repo recently, please rebase the affected branches. Our apologies for this inconvenience.
TensorRT OSS v8.5.3
TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.
- Updates since TensorRT 8.5.2 GA release.
- Please refer to the TensorRT 8.5.3 GA release notes for more information.
Key Features and Updates:
- Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
- Added nvinfer1::plugin namespace
- Optimized KV Cache performance for T5
TensorRT OSS v8.5.2
TensorRT OSS release corresponding to TensorRT 8.5.2.2 GA release.
Updates since TensorRT 8.5.1 GA release.
Please refer to the TensorRT 8.5.2 GA release notes for more information.
Key Features and Updates:
- Plugin enhancements
- Added LayerNormPlugin, SplitGeLUPlugin, GroupNormPlugin, and SeqLen2SpatialPlugin to support stable diffusion demo.
- KV-cache and beam search to GPT2 and T5 demos
22.12
Commit used by the 22.12 TensorRT NGC container.
Added
- Stable Diffusion demo using TensorRT Plugins
- KV-cache and beam search to GPT2 and T5 demos
- Perplexity calculation to all HF demos
Changed
- Updated trex to v0.1.5
- Increased default workspace size in demoBERT to build BS=128 fp32 engines
- Use
avg_iter=8
and timing cache to make demoBERT perf more stable
Removed
- None
TensorRT OSS v8.5.1
TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.
- Updates since TensorRT 8.4.1 GA release.
- Please refer to the TensorRT 8.5.1 GA release notes for more information.
Key Features and Updates:
-
Samples enhancements
- Added sampleNamedDimensions which works with named dimensions.
- Updated
sampleINT8API
andintroductory_parser_samples
to useONNX
models overCaffe
/UFF
- Removed UFF/Caffe samples including
sampleMNIST
,end_to_end_tensorflow_mnist
,sampleINT8
,sampleMNISTAPI
,sampleUffMNIST
,sampleUffPluginV2Ext
,engine_refit_mnist
,int8_caffe_mnist
,uff_custom_plugin
,sampleFasterRCNN
,sampleUffFasterRCNN
,sampleGoogleNet
,sampleSSD
,sampleUffSSD
,sampleUffMaskRCNN
anduff_ssd
.
-
Plugin enhancements
- Added GridAnchorRectPlugin to support rectangular feature maps in gridAnchorPlugin.
- Added ROIAlignPlugin to support the ONNX operator RoiAlign. The ONNX parser will automatically route ROIAlign ops through the plugin.
- Added Hopper support for the BERTQKVToContextPlugin plugin.
- Exposed the use_int8_scale_max attribute in the BERTQKVToContextPlugin plugin to allow users to disable the by-default usage of INT8 scale factors to optimize softmax MAX reduction in versions 2 and 3 of the plugin.
-
ONNX-TensorRT changes
- Added support for operator Reciprocal.
-
Build containers
- Updated default cuda versions to
11.8.0
.
- Updated default cuda versions to
-
Tooling enhancements
- Updated onnx-graphsurgeon to v0.3.25.
- Updated Polygraphy to v0.43.1.
- Updated polygraphy-extension-trtexec to v0.0.8.
- Updated Tensorflow Quantization Toolkit to v0.2.0.
TensorRT OSS v8.4.3
TensorRT OSS release corresponding to TensorRT 8.4.3.1 release.
- Updates since TensorRT 8.4.2 release.
- Please refer to the TensorRT 8.4.3 release notes for more information.
Key Updates:
- Python packages for Python 3.10.
- Bug fix for potential overlaps in H2D and inference execution in
trtexec
.
22.08
Commit used by the 22.08 TensorRT NGC container.
Changelog
Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information
Changed
- Updated default protobuf version to 3.20.x
- Updated ONNX-TensorRT submodule version to
22.08
tag - Updated
sampleIOFormats
andsampleAlgorithmSelector
to useONNX
models overCaffe
Fixes
- Fixed missing serialization member in
CustomClipPlugin
plugin - Fixed various Python import issues
Added
- Added new DeBERTA demo
- Added version 2 for
disentangledAttentionPlugin
to support DeBERTA v2
Removed
- None
22.07
Commit used by the 22.07 TensorRT NGC container.
Changelog
Added
polygraphy-trtexec-plugin
tool for Polygraphy- Multi-profile support for demoBERT
- KV cache support for HF BART demo
Changed
- Updated ONNX-GS to
v0.3.20
Removed
- None