Update TensorRT to 8.6.1

Signed-off-by: Ilya Sherstyuk <[email protected]>
NVIDIA · May 5, 2023 · e314528 · e314528
1 parent b83cbbd
commit e314528
Show file tree

Hide file tree

Showing 578 changed files with 4,943 additions and 3,505 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -1,44 +1,67 @@
 ---
-name: TensorRT OSS Bug Report
-about: Report any bugs to help us improve TensorRT.
-title: ''
+name: Report a TensorRT issue
+about: The more information you share, the more feedback we can provide.
+title: 'XXX failure of TensorRT X.Y when running XXX on GPU XXX'
 labels: ''
 assignees: ''
 
 ---
 
 ## Description
 
-<!-- A clear and concise description of the bug or issue. -->
+<!--
+  A clear and concise description of the issue.
+
+  For example: I tried to run model ABC on GPU, but it fails with the error below (share a 2-3 line error log).
+-->
 
 
 ## Environment
 
-**TensorRT Version**: 
-**NVIDIA GPU**: 
-**NVIDIA Driver Version**: 
-**CUDA Version**: 
-**CUDNN Version**: 
-**Operating System**: 
-**Python Version (if applicable)**: 
-**Tensorflow Version (if applicable)**: 
-**PyTorch Version (if applicable)**: 
-**Baremetal or Container (if so, version)**: 
+<!-- Please share any setup information you know. This will help us to understand and address your case. -->
+
+**TensorRT Version**:
+
+**NVIDIA GPU**:
+
+**NVIDIA Driver Version**:
+
+**CUDA Version**:
+
+**CUDNN Version**:
+
+
+Operating System:
+
+Python Version (if applicable):
+
+Tensorflow Version (if applicable):
+
+PyTorch Version (if applicable):
+
+Baremetal or Container (if so, version):
 
 
 ## Relevant Files
 
 <!-- Please include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive/Dropbox, etc.) -->
 
+**Model link**:
+
 
 ## Steps To Reproduce
 
-<!-- 
+<!--
   Craft a minimal bug report following this guide - https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
 
   Please include:
   * Exact steps/commands to build your repro
   * Exact steps/commands to run your repro
-  * Full traceback of errors encountered 
+  * Full traceback of errors encountered
 -->
 
+**Commands or scripts**:
+
+**Have you tried [the latest release](https://developer.nvidia.com/tensorrt)?**:
+
+**Can this model run on other frameworks?** For example run ONNX model with ONNXRuntime (`polygraphy run <model.onnx> --onnxrt`):
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,34 +1,45 @@
 # TensorRT OSS Release Changelog
 
-## [8.6.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8) - 2023-03-14
+## [8.6.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-1) - 2023-05-02
+
+TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.
+- Updates since [TensorRT 8.6.0 EA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA).
+- Please refer to the [TensorRT 8.6.1.6 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-1) for more information.
+
+Key Features and Updates:
+
+- Added a new flag `--use-cuda-graph` to demoDiffusion to improve performance.
+- Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.
+
+## [8.6.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA) - 2023-03-10
 
 TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.
-- Updates since [TensorRT 8.5.3 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3).
-- Please refer to the [TensorRT 8.6.0.12 EA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8) for more information.
+- Updates since [TensorRT 8.5.3 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3).
+- Please refer to the [TensorRT 8.6.0.12 EA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA) for more information.
 
 Key Features and Updates:
 
 - demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
   - The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
 - Added a new sample called onnx_custom_plugin.
 
-## [8.5.3 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3) - 2023-01-30
+## [8.5.3 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3) - 2023-01-30
 
 TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.
-- Updates since [TensorRT 8.5.2 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2).
-- Please refer to the [TensorRT 8.5.3 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3) for more information.
+- Updates since [TensorRT 8.5.2 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2).
+- Please refer to the [TensorRT 8.5.3 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3) for more information.
 
 Key Features and Updates:
 
 - Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
 - Added nvinfer1::plugin namespace
 - Optimized KV Cache performance for T5
 
-## [8.5.2 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2) - 2022-12-12
+## [8.5.2 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2) - 2022-12-12
 
 TensorRT OSS release corresponding to TensorRT 8.5.2.2 GA release.
-- Updates since [TensorRT 8.5.1 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1).
-- Please refer to the [TensorRT 8.5.2 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2) for more information.
+- Updates since [TensorRT 8.5.1 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1).
+- Please refer to the [TensorRT 8.5.2 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2) for more information.
 
 Key Features and Updates:
 
@@ -51,11 +62,11 @@ Key Features and Updates:
 ### Removed
 - None
 
-## [8.5.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1) - 2022-11-01
+## [8.5.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1) - 2022-11-01
 
 TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.
 - Updates since [TensorRT 8.4.1 GA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.4.1).
-- Please refer to the [TensorRT 8.5.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1) for more information.
+- Please refer to the [TensorRT 8.5.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1) for more information.
 
 Key Features and Updates:
 
@@ -84,7 +95,7 @@ Key Features and Updates:
 
 ## [22.08](https://github.com/NVIDIA/TensorRT/releases/tag/22.08) - 2022-08-16
 
-Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-2) for more information
+Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-2) for more information
 
 ### Changed
 - Updated default protobuf version to 3.20.x
@@ -114,11 +125,11 @@ Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https
 ### Removed
 - None
 
-## [8.4.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-1) - 2022-06-14
+## [8.4.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-1) - 2022-06-14
 
 TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.
 - Updates since [TensorRT 8.2.1 GA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.1).
-- Please refer to the [TensorRT 8.4.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-1) for more information.
+- Please refer to the [TensorRT 8.4.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-1) for more information.
 
 Key Features and Updates:
 
@@ -258,11 +269,11 @@ Key Features and Updates:
 ### Removed
 - Unused source file(s) in demo/BERT
 
-## [8.2.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-1) - 2021-11-24
+## [8.2.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-2-1) - 2021-11-24
 
 TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
 - Updates since [TensorRT 8.2.0 EA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.0-EA).
-- Please refer to the [TensorRT 8.2.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-1) for more information.
+- Please refer to the [TensorRT 8.2.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-2-1) for more information.
 
 - ONNX parser [v8.2.1](https://github.com/onnx/onnx-tensorrt/releases/tag/release%2F8.2-GA)
   - Removed duplicate constant layer checks that caused some performance regressions
@@ -316,7 +327,7 @@ TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
   - Updated Python documentation for `add_reduce`, `add_top_k`, and `ISoftMaxLayer`
   - Renamed default GitHub branch to `main` and updated hyperlinks
 
-## [8.2.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-0-EA) - 2021-10-05
+## [8.2.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-2-0-EA) - 2021-10-05
 ### Added
 - [Demo applications](demo/HuggingFace) showcasing TensorRT inference of [HuggingFace Transformers](https://huggingface.co/transformers).
   - Support is currently extended to GPT-2 and T5 models.
@@ -426,7 +437,7 @@ TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
 ## [21.07](https://github.com/NVIDIA/TensorRT/releases/tag/21.07) - 2021-07-21
 Identical to the TensorRT-OSS [8.0.1](https://github.com/NVIDIA/TensorRT/releases/tag/8.0.1) Release.
 
-## [8.0.1](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8) - 2021-07-02
+## [8.0.1](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#tensorrt-8) - 2021-07-02
 ### Added
 - Added support for the following ONNX operators: `Celu`, `CumSum`, `EyeLike`, `GatherElements`, `GlobalLpPool`, `GreaterOrEqual`, `LessOrEqual`, `LpNormalization`, `LpPool`, `ReverseSequence`, and `SoftmaxCrossEntropyLoss` [details]().
 - Rehauled `Resize` ONNX operator, now fully supporting the following modes:

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -48,7 +48,7 @@ set(CMAKE_SKIP_BUILD_RPATH True)
 project(TensorRT
         LANGUAGES CXX CUDA
         VERSION ${TRT_VERSION}
-        DESCRIPTION "TensorRT is a C++ library that facilitates high performance inference on NVIDIA GPUs and deep learning accelerators."
+        DESCRIPTION "TensorRT is a C++ library that facilitates high-performance inference on NVIDIA GPUs and deep learning accelerators."
         HOMEPAGE_URL "https://github.com/NVIDIA/TensorRT")
 
 if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
@@ -88,8 +88,8 @@ endif()
 ############################################################################################
 # Dependencies
 
-set(DEFAULT_CUDA_VERSION 11.3.1)
-set(DEFAULT_CUDNN_VERSION 8.2)
+set(DEFAULT_CUDA_VERSION 12.0.1)
+set(DEFAULT_CUDNN_VERSION 8.8)
 set(DEFAULT_PROTOBUF_VERSION 3.20.1)
 
 # Dependency Version Resolution

diff --git a/README.md b/README.md
@@ -25,8 +25,8 @@ You can skip the **Build** section to enjoy TensorRT with Python.
 ## Prerequisites
 To build the TensorRT-OSS components, you will first need the following software packages.
 
-**TensorRT EA build**
-* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.6.0.12
+**TensorRT GA build**
+* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.6.1.6
 
 **System Packages**
 * [CUDA](https://developer.nvidia.com/cuda-toolkit)
@@ -48,8 +48,8 @@ To build the TensorRT-OSS components, you will first need the following software
   * (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 5.0 (current support only for TensorRT 8.4.0 and TensorRT 8.5.2)
   * (Cross compilation for QNX platform) [QNX Toolchain](https://blackberry.qnx.com/en)
 * PyPI packages (for demo applications/tests)
-  * [onnx](https://pypi.org/project/onnx/) 1.9.0
-  * [onnxruntime](https://pypi.org/project/onnxruntime/) 1.8.0
+  * [onnx](https://pypi.org/project/onnx/)
+  * [onnxruntime](https://pypi.org/project/onnxruntime/)
   * [tensorflow-gpu](https://pypi.org/project/tensorflow/) >= 2.5.1
   * [Pillow](https://pypi.org/project/Pillow/) >= 9.0.1
   * [pycuda](https://pypi.org/project/pycuda/) < 2021.1
@@ -70,18 +70,18 @@ To build the TensorRT-OSS components, you will first need the following software
 	git submodule update --init --recursive
 	```
 
-2. #### (Optional - if not using TensorRT container) Specify the TensorRT EA release build path
+2. #### (Optional - if not using TensorRT container) Specify the TensorRT GA release build path
 
     If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
 
-    Else download and extract the TensorRT EA build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).
+    Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).
 
     **Example: Ubuntu 20.04 on x86-64 with cuda-12.0**
 
     ```bash
     cd ~/Downloads
-    tar -xvzf TensorRT-8.6.0.12.Linux.x86_64-gnu.cuda-12.0.tar.gz
-    export TRT_LIBPATH=`pwd`/TensorRT-8.6.0.12
+    tar -xvzf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0.tar.gz
+    export TRT_LIBPATH=`pwd`/TensorRT-8.6.1.6
     ```
 
 
@@ -111,9 +111,9 @@ For Linux platforms, we recommend that you generate a docker container for build
     ```bash
     ./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda11.4
     ```
-    **Example: Ubuntu 20.04 on aarch64 with cuda-11.4.2**
+    **Example: Ubuntu 20.04 on aarch64 with cuda-11.8**
     ```bash
-    ./docker/build.sh --file docker/ubuntu-20.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu20.04-cuda11.4
+    ./docker/build.sh --file docker/ubuntu-20.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu20.04-cuda11.8 --cuda 11.8.0
     ```
 
 2. #### Launch the TensorRT-OSS build container.
@@ -143,7 +143,7 @@ For Linux platforms, we recommend that you generate a docker container for build
     yum -y install centos-release-scl
     yum-config-manager --enable rhel-server-rhscl-7-rpms
     yum -y install devtoolset-8
-    export PATH="/opt/rh/devtoolset-8/root/bin:${PATH}
+    export PATH="/opt/rh/devtoolset-8/root/bin:${PATH}"
     ```
 
     **Example: Linux (aarch64) build with default cuda-12.0**
@@ -174,14 +174,14 @@ For Linux platforms, we recommend that you generate a docker container for build
     > NOTE: The latest JetPack SDK v5.1 only supports TensorRT 8.5.2.
 
 	> NOTE:
-	<br> 1. The default CUDA version used by CMake is 11.8.0. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
+	<br> 1. The default CUDA version used by CMake is 12.0.1. To override this, for example to 11.8, append `-DCUDA_VERSION=11.8` to the cmake command.
 	<br> 2. If samples fail to link on CentOS7, create this symbolic link: `ln -s $TRT_OUT_DIR/libnvinfer_plugin.so $TRT_OUT_DIR/libnvinfer_plugin.so.8`
 * Required CMake build arguments are:
 	- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.
 	- `TRT_OUT_DIR`: Output directory where generated build artifacts will be copied.
 * Optional CMake build arguments:
 	- `CMAKE_BUILD_TYPE`: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [`Release`] | `Debug`
-	- `CUDA_VERISON`: The version of CUDA to target, for example [`11.7.1`].
+	- `CUDA_VERSION`: The version of CUDA to target, for example [`11.7.1`].
 	- `CUDNN_VERSION`: The version of cuDNN to target, for example [`8.6`].
 	- `PROTOBUF_VERSION`:  The version of Protobuf to use, for example [`3.0.0`]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
 	- `CMAKE_TOOLCHAIN_FILE`: The path to a toolchain file for cross compilation.

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-8.6.0.12
+8.6.1.6
diff --git a/cmake/toolchains/cmake_aarch64-native.toolchain b/cmake/toolchains/cmake_aarch64-native.toolchain
@@ -1,5 +1,5 @@
 #
-# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 #
 # Licensed under the Apache License, Version 2.0 (the "License");

diff --git a/demo/BERT/CMakeLists.txt b/demo/BERT/CMakeLists.txt
@@ -1,5 +1,5 @@
 #
-# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 #
 # Licensed under the Apache License, Version 2.0 (the "License");

diff --git a/demo/BERT/README.md b/demo/BERT/README.md
@@ -64,7 +64,7 @@ Since the tokenizer and projection of the final predictions are not nearly as co
 
 The tokenizer splits the input text into tokens that can be consumed by the model. For details on this process, see [this tutorial](https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/).
 
-To run the BERT model in TensorRT, we construct the model using TensorRT APIs and import the weights from a pre-trained TensorFlow checkpoint from [NGC](https://ngc.nvidia.com/models/nvidian:bert_tf_v2_large_fp16_128). Finally, a TensorRT engine is generated and serialized to the disk. The various inference scripts then load this engine for inference.
+To run the BERT model in TensorRT, we construct the model using TensorRT APIs and import the weights from a pre-trained TensorFlow checkpoint from [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_tf_ckpt_large_qa_squad2_amp_128). Finally, a TensorRT engine is generated and serialized to the disk. The various inference scripts then load this engine for inference.
 
 Lastly, the tokens predicted by the model are projected back to the original text to get a final result.
 
@@ -586,3 +586,4 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o
 | 384 | 32 | 40.79 | 40.97 | 40.46 |
 | 384 | 64 | 78.04 | 78.41 | 77.51 |
 | 384 | 128 | 151.33 | 151.62 | 150.76 |
+
diff --git a/demo/BERT/builder.py b/demo/BERT/builder.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 #
-# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 #
 # Licensed under the Apache License, Version 2.0 (the "License");