Skip to content

Commit

Permalink
Update TensorRT to 8.6.1
Browse files Browse the repository at this point in the history
Signed-off-by: Ilya Sherstyuk <[email protected]>
  • Loading branch information
ilyasher committed May 5, 2023
1 parent b83cbbd commit e314528
Show file tree
Hide file tree
Showing 578 changed files with 4,943 additions and 3,505 deletions.
55 changes: 39 additions & 16 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,67 @@
---
name: TensorRT OSS Bug Report
about: Report any bugs to help us improve TensorRT.
title: ''
name: Report a TensorRT issue
about: The more information you share, the more feedback we can provide.
title: 'XXX failure of TensorRT X.Y when running XXX on GPU XXX'
labels: ''
assignees: ''

---

## Description

<!-- A clear and concise description of the bug or issue. -->
<!--
A clear and concise description of the issue.
For example: I tried to run model ABC on GPU, but it fails with the error below (share a 2-3 line error log).
-->


## Environment

**TensorRT Version**:
**NVIDIA GPU**:
**NVIDIA Driver Version**:
**CUDA Version**:
**CUDNN Version**:
**Operating System**:
**Python Version (if applicable)**:
**Tensorflow Version (if applicable)**:
**PyTorch Version (if applicable)**:
**Baremetal or Container (if so, version)**:
<!-- Please share any setup information you know. This will help us to understand and address your case. -->

**TensorRT Version**:

**NVIDIA GPU**:

**NVIDIA Driver Version**:

**CUDA Version**:

**CUDNN Version**:


Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):


## Relevant Files

<!-- Please include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive/Dropbox, etc.) -->

**Model link**:


## Steps To Reproduce

<!--
<!--
Craft a minimal bug report following this guide - https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
Please include:
* Exact steps/commands to build your repro
* Exact steps/commands to run your repro
* Full traceback of errors encountered
* Full traceback of errors encountered
-->

**Commands or scripts**:

**Have you tried [the latest release](https://developer.nvidia.com/tensorrt)?**:

**Can this model run on other frameworks?** For example run ONNX model with ONNXRuntime (`polygraphy run <model.onnx> --onnxrt`):
47 changes: 29 additions & 18 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,45 @@
# TensorRT OSS Release Changelog

## [8.6.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8) - 2023-03-14
## [8.6.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-1) - 2023-05-02

TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.
- Updates since [TensorRT 8.6.0 EA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA).
- Please refer to the [TensorRT 8.6.1.6 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-1) for more information.

Key Features and Updates:

- Added a new flag `--use-cuda-graph` to demoDiffusion to improve performance.
- Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.

## [8.6.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA) - 2023-03-10

TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.
- Updates since [TensorRT 8.5.3 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3).
- Please refer to the [TensorRT 8.6.0.12 EA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8) for more information.
- Updates since [TensorRT 8.5.3 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3).
- Please refer to the [TensorRT 8.6.0.12 EA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA) for more information.

Key Features and Updates:

- demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
- The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
- Added a new sample called onnx_custom_plugin.

## [8.5.3 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3) - 2023-01-30
## [8.5.3 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3) - 2023-01-30

TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.
- Updates since [TensorRT 8.5.2 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2).
- Please refer to the [TensorRT 8.5.3 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3) for more information.
- Updates since [TensorRT 8.5.2 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2).
- Please refer to the [TensorRT 8.5.3 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3) for more information.

Key Features and Updates:

- Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
- Added nvinfer1::plugin namespace
- Optimized KV Cache performance for T5

## [8.5.2 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2) - 2022-12-12
## [8.5.2 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2) - 2022-12-12

TensorRT OSS release corresponding to TensorRT 8.5.2.2 GA release.
- Updates since [TensorRT 8.5.1 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1).
- Please refer to the [TensorRT 8.5.2 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2) for more information.
- Updates since [TensorRT 8.5.1 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1).
- Please refer to the [TensorRT 8.5.2 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2) for more information.

Key Features and Updates:

Expand All @@ -51,11 +62,11 @@ Key Features and Updates:
### Removed
- None

## [8.5.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1) - 2022-11-01
## [8.5.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1) - 2022-11-01

TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.
- Updates since [TensorRT 8.4.1 GA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.4.1).
- Please refer to the [TensorRT 8.5.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1) for more information.
- Please refer to the [TensorRT 8.5.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1) for more information.

Key Features and Updates:

Expand Down Expand Up @@ -84,7 +95,7 @@ Key Features and Updates:

## [22.08](https://github.com/NVIDIA/TensorRT/releases/tag/22.08) - 2022-08-16

Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-2) for more information
Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-2) for more information

### Changed
- Updated default protobuf version to 3.20.x
Expand Down Expand Up @@ -114,11 +125,11 @@ Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https
### Removed
- None

## [8.4.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-1) - 2022-06-14
## [8.4.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-1) - 2022-06-14

TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.
- Updates since [TensorRT 8.2.1 GA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.1).
- Please refer to the [TensorRT 8.4.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-1) for more information.
- Please refer to the [TensorRT 8.4.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-1) for more information.

Key Features and Updates:

Expand Down Expand Up @@ -258,11 +269,11 @@ Key Features and Updates:
### Removed
- Unused source file(s) in demo/BERT

## [8.2.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-1) - 2021-11-24
## [8.2.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-2-1) - 2021-11-24

TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
- Updates since [TensorRT 8.2.0 EA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.0-EA).
- Please refer to the [TensorRT 8.2.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-1) for more information.
- Please refer to the [TensorRT 8.2.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-2-1) for more information.

- ONNX parser [v8.2.1](https://github.com/onnx/onnx-tensorrt/releases/tag/release%2F8.2-GA)
- Removed duplicate constant layer checks that caused some performance regressions
Expand Down Expand Up @@ -316,7 +327,7 @@ TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
- Updated Python documentation for `add_reduce`, `add_top_k`, and `ISoftMaxLayer`
- Renamed default GitHub branch to `main` and updated hyperlinks

## [8.2.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-0-EA) - 2021-10-05
## [8.2.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-2-0-EA) - 2021-10-05
### Added
- [Demo applications](demo/HuggingFace) showcasing TensorRT inference of [HuggingFace Transformers](https://huggingface.co/transformers).
- Support is currently extended to GPT-2 and T5 models.
Expand Down Expand Up @@ -426,7 +437,7 @@ TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
## [21.07](https://github.com/NVIDIA/TensorRT/releases/tag/21.07) - 2021-07-21
Identical to the TensorRT-OSS [8.0.1](https://github.com/NVIDIA/TensorRT/releases/tag/8.0.1) Release.

## [8.0.1](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8) - 2021-07-02
## [8.0.1](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#tensorrt-8) - 2021-07-02
### Added
- Added support for the following ONNX operators: `Celu`, `CumSum`, `EyeLike`, `GatherElements`, `GlobalLpPool`, `GreaterOrEqual`, `LessOrEqual`, `LpNormalization`, `LpPool`, `ReverseSequence`, and `SoftmaxCrossEntropyLoss` [details]().
- Rehauled `Resize` ONNX operator, now fully supporting the following modes:
Expand Down
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ set(CMAKE_SKIP_BUILD_RPATH True)
project(TensorRT
LANGUAGES CXX CUDA
VERSION ${TRT_VERSION}
DESCRIPTION "TensorRT is a C++ library that facilitates high performance inference on NVIDIA GPUs and deep learning accelerators."
DESCRIPTION "TensorRT is a C++ library that facilitates high-performance inference on NVIDIA GPUs and deep learning accelerators."
HOMEPAGE_URL "https://github.com/NVIDIA/TensorRT")

if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
Expand Down Expand Up @@ -88,8 +88,8 @@ endif()
############################################################################################
# Dependencies

set(DEFAULT_CUDA_VERSION 11.3.1)
set(DEFAULT_CUDNN_VERSION 8.2)
set(DEFAULT_CUDA_VERSION 12.0.1)
set(DEFAULT_CUDNN_VERSION 8.8)
set(DEFAULT_PROTOBUF_VERSION 3.20.1)

# Dependency Version Resolution
Expand Down
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ You can skip the **Build** section to enjoy TensorRT with Python.
## Prerequisites
To build the TensorRT-OSS components, you will first need the following software packages.

**TensorRT EA build**
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.6.0.12
**TensorRT GA build**
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.6.1.6

**System Packages**
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
Expand All @@ -48,8 +48,8 @@ To build the TensorRT-OSS components, you will first need the following software
* (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 5.0 (current support only for TensorRT 8.4.0 and TensorRT 8.5.2)
* (Cross compilation for QNX platform) [QNX Toolchain](https://blackberry.qnx.com/en)
* PyPI packages (for demo applications/tests)
* [onnx](https://pypi.org/project/onnx/) 1.9.0
* [onnxruntime](https://pypi.org/project/onnxruntime/) 1.8.0
* [onnx](https://pypi.org/project/onnx/)
* [onnxruntime](https://pypi.org/project/onnxruntime/)
* [tensorflow-gpu](https://pypi.org/project/tensorflow/) >= 2.5.1
* [Pillow](https://pypi.org/project/Pillow/) >= 9.0.1
* [pycuda](https://pypi.org/project/pycuda/) < 2021.1
Expand All @@ -70,18 +70,18 @@ To build the TensorRT-OSS components, you will first need the following software
git submodule update --init --recursive
```

2. #### (Optional - if not using TensorRT container) Specify the TensorRT EA release build path
2. #### (Optional - if not using TensorRT container) Specify the TensorRT GA release build path

If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.

Else download and extract the TensorRT EA build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).

**Example: Ubuntu 20.04 on x86-64 with cuda-12.0**

```bash
cd ~/Downloads
tar -xvzf TensorRT-8.6.0.12.Linux.x86_64-gnu.cuda-12.0.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-8.6.0.12
tar -xvzf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-8.6.1.6
```


Expand Down Expand Up @@ -111,9 +111,9 @@ For Linux platforms, we recommend that you generate a docker container for build
```bash
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda11.4
```
**Example: Ubuntu 20.04 on aarch64 with cuda-11.4.2**
**Example: Ubuntu 20.04 on aarch64 with cuda-11.8**
```bash
./docker/build.sh --file docker/ubuntu-20.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu20.04-cuda11.4
./docker/build.sh --file docker/ubuntu-20.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu20.04-cuda11.8 --cuda 11.8.0
```
2. #### Launch the TensorRT-OSS build container.
Expand Down Expand Up @@ -143,7 +143,7 @@ For Linux platforms, we recommend that you generate a docker container for build
yum -y install centos-release-scl
yum-config-manager --enable rhel-server-rhscl-7-rpms
yum -y install devtoolset-8
export PATH="/opt/rh/devtoolset-8/root/bin:${PATH}
export PATH="/opt/rh/devtoolset-8/root/bin:${PATH}"
```
**Example: Linux (aarch64) build with default cuda-12.0**
Expand Down Expand Up @@ -174,14 +174,14 @@ For Linux platforms, we recommend that you generate a docker container for build
> NOTE: The latest JetPack SDK v5.1 only supports TensorRT 8.5.2.
> NOTE:
<br> 1. The default CUDA version used by CMake is 11.8.0. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
<br> 1. The default CUDA version used by CMake is 12.0.1. To override this, for example to 11.8, append `-DCUDA_VERSION=11.8` to the cmake command.
<br> 2. If samples fail to link on CentOS7, create this symbolic link: `ln -s $TRT_OUT_DIR/libnvinfer_plugin.so $TRT_OUT_DIR/libnvinfer_plugin.so.8`
* Required CMake build arguments are:
- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.
- `TRT_OUT_DIR`: Output directory where generated build artifacts will be copied.
* Optional CMake build arguments:
- `CMAKE_BUILD_TYPE`: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [`Release`] | `Debug`
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.7.1`].
- `CUDA_VERSION`: The version of CUDA to target, for example [`11.7.1`].
- `CUDNN_VERSION`: The version of cuDNN to target, for example [`8.6`].
- `PROTOBUF_VERSION`: The version of Protobuf to use, for example [`3.0.0`]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
- `CMAKE_TOOLCHAIN_FILE`: The path to a toolchain file for cross compilation.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8.6.0.12
8.6.1.6
2 changes: 1 addition & 1 deletion cmake/toolchains/cmake_aarch64-native.toolchain
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand Down
2 changes: 1 addition & 1 deletion demo/BERT/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand Down
3 changes: 2 additions & 1 deletion demo/BERT/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Since the tokenizer and projection of the final predictions are not nearly as co

The tokenizer splits the input text into tokens that can be consumed by the model. For details on this process, see [this tutorial](https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/).

To run the BERT model in TensorRT, we construct the model using TensorRT APIs and import the weights from a pre-trained TensorFlow checkpoint from [NGC](https://ngc.nvidia.com/models/nvidian:bert_tf_v2_large_fp16_128). Finally, a TensorRT engine is generated and serialized to the disk. The various inference scripts then load this engine for inference.
To run the BERT model in TensorRT, we construct the model using TensorRT APIs and import the weights from a pre-trained TensorFlow checkpoint from [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_tf_ckpt_large_qa_squad2_amp_128). Finally, a TensorRT engine is generated and serialized to the disk. The various inference scripts then load this engine for inference.

Lastly, the tokens predicted by the model are projected back to the original text to get a final result.

Expand Down Expand Up @@ -586,3 +586,4 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o
| 384 | 32 | 40.79 | 40.97 | 40.46 |
| 384 | 64 | 78.04 | 78.41 | 77.51 |
| 384 | 128 | 151.33 | 151.62 | 150.76 |

2 changes: 1 addition & 1 deletion demo/BERT/builder.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python3
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand Down
Loading

0 comments on commit e314528

Please sign in to comment.