Mass integration for 23.08 release

1. update pytorch-quantization to 2.1.3 2. update SD to support torch 2.x 3. update docker container 4. misc fixed in samples Signed-off-by: Vincent Huang <[email protected]>
NVIDIA · Aug 7, 2023 · 35477bd · 35477bd
1 parent a167852
commit 35477bd
Show file tree

Hide file tree

Showing 85 changed files with 560 additions and 314 deletions.
diff --git a/README.md b/README.md
@@ -31,7 +31,7 @@ To build the TensorRT-OSS components, you will first need the following software
 **System Packages**
 * [CUDA](https://developer.nvidia.com/cuda-toolkit)
   * Recommended versions:
-  * cuda-12.0.1 + cuDNN-8.8
+  * cuda-12.2.0 + cuDNN-8.8
   * cuda-11.8.0 + cuDNN-8.8
 * [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
 * [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
@@ -99,9 +99,9 @@ For Linux platforms, we recommend that you generate a docker container for build
 1. #### Generate the TensorRT-OSS build container.
     The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build scripts. The build containers are configured for building TensorRT OSS out-of-the-box.
 
-    **Example: Ubuntu 20.04 on x86-64 with cuda-12.0 (default)**
+    **Example: Ubuntu 20.04 on x86-64 with cuda-12.1 (default)**
     ```bash
-    ./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.0
+    ./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.1
     ```
     **Example: CentOS/RedHat 7 on x86-64 with cuda-11.8**
     ```bash
@@ -119,7 +119,7 @@ For Linux platforms, we recommend that you generate a docker container for build
 2. #### Launch the TensorRT-OSS build container.
     **Example: Ubuntu 20.04 build container**
 	```bash
-	./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.0 --gpus all
+	./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.1 --gpus all
 	```
 	> NOTE:
   <br> 1. Use the `--tag` corresponding to build container generated in Step 1.
@@ -130,7 +130,7 @@ For Linux platforms, we recommend that you generate a docker container for build
 ## Building TensorRT-OSS
 * Generate Makefiles and build.
 
-    **Example: Linux (x86-64) build with default cuda-12.0**
+    **Example: Linux (x86-64) build with default cuda-12.1**
 	```bash
 	cd $TRT_OSSPATH
 	mkdir -p build && cd build
@@ -146,7 +146,7 @@ For Linux platforms, we recommend that you generate a docker container for build
     export PATH="/opt/rh/devtoolset-8/root/bin:${PATH}"
     ```
 
-    **Example: Linux (aarch64) build with default cuda-12.0**
+    **Example: Linux (aarch64) build with default cuda-12.1**
 	```bash
 	cd $TRT_OSSPATH
 	mkdir -p build && cd build
@@ -174,7 +174,7 @@ For Linux platforms, we recommend that you generate a docker container for build
     > NOTE: The latest JetPack SDK v5.1 only supports TensorRT 8.5.2.
 
 	> NOTE:
-	<br> 1. The default CUDA version used by CMake is 12.0.1. To override this, for example to 11.8, append `-DCUDA_VERSION=11.8` to the cmake command.
+	<br> 1. The default CUDA version used by CMake is 11.4.1. To override this, for example to 11.8, append `-DCUDA_VERSION=11.8` to the cmake command.
 	<br> 2. If samples fail to link on CentOS7, create this symbolic link: `ln -s $TRT_OUT_DIR/libnvinfer_plugin.so $TRT_OUT_DIR/libnvinfer_plugin.so.8`
 * Required CMake build arguments are:
 	- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.

diff --git a/demo/Diffusion/README.md b/demo/Diffusion/README.md
@@ -16,7 +16,7 @@ cd TensorRT
 Install nvidia-docker using [these intructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
 
 ```bash
-docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.02-py3 /bin/bash
+docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.06-py3 /bin/bash
 ```
 
 ### Install latest TensorRT release

diff --git a/demo/Diffusion/demo_img2img.py b/demo/Diffusion/demo_img2img.py
@@ -94,7 +94,7 @@ def parseArgs():
         force_export=args.force_onnx_export, force_optimize=args.force_onnx_optimize, \
         force_build=args.force_engine_build, \
         static_batch=args.build_static_batch, static_shape=not args.build_dynamic_shape, \
-        enable_refit=args.build_enable_refit, enable_preview=args.build_preview_features, enable_all_tactics=args.build_all_tactics, \
+        enable_refit=args.build_enable_refit, enable_all_tactics=args.build_all_tactics, \
         timing_cache=args.timing_cache, onnx_refit_dir=args.onnx_refit_dir)
     demo.loadResources(image_height, image_width, batch_size, args.seed)
 

diff --git a/demo/Diffusion/demo_inpaint.py b/demo/Diffusion/demo_inpaint.py
@@ -104,7 +104,7 @@ def parseArgs():
         force_export=args.force_onnx_export, force_optimize=args.force_onnx_optimize, \
         force_build=args.force_engine_build, \
         static_batch=args.build_static_batch, static_shape=not args.build_dynamic_shape, \
-        enable_preview=args.build_preview_features, enable_all_tactics=args.build_all_tactics, \
+        enable_all_tactics=args.build_all_tactics, \
         timing_cache=args.timing_cache)
     demo.loadResources(image_height, image_width, batch_size, args.seed)
 

diff --git a/demo/Diffusion/demo_txt2img.py b/demo/Diffusion/demo_txt2img.py
@@ -82,7 +82,7 @@ def parseArgs():
         force_export=args.force_onnx_export, force_optimize=args.force_onnx_optimize, \
         force_build=args.force_engine_build, \
         static_batch=args.build_static_batch, static_shape=not args.build_dynamic_shape, \
-        enable_refit=args.build_enable_refit, enable_preview=args.build_preview_features, enable_all_tactics=args.build_all_tactics, \
+        enable_refit=args.build_enable_refit, enable_all_tactics=args.build_all_tactics, \
         timing_cache=args.timing_cache, onnx_refit_dir=args.onnx_refit_dir)
     demo.loadResources(image_height, image_width, batch_size, args.seed)
 

diff --git a/demo/Diffusion/requirements.txt b/demo/Diffusion/requirements.txt
@@ -11,5 +11,5 @@ onnxruntime==1.14.1
 onnx-graphsurgeon==0.3.26
 polygraphy==0.47.1
 scipy
-torch<2.0.0
+torch
 transformers==4.26.1
diff --git a/demo/Diffusion/stable_diffusion_pipeline.py b/demo/Diffusion/stable_diffusion_pipeline.py
@@ -195,7 +195,6 @@ def loadEngines(
         static_batch=False,
         static_shape=True,
         enable_refit=False,
-        enable_preview=False,
         enable_all_tactics=False,
         timing_cache=None,
         onnx_refit_dir=None,
@@ -229,8 +228,6 @@ def loadEngines(
                 Build engine only for specified opt_image_height & opt_image_width. Default = True.
             enable_refit (bool):
                 Build engines with refit option enabled.
-            enable_preview (bool):
-                Enable TensorRT preview features.
             enable_all_tactics (bool):
                 Enable all tactic sources during TensorRT engine builds.
             timing_cache (str):
@@ -304,7 +301,6 @@ def loadEngines(
                         static_batch=static_batch, static_shape=static_shape
                     ),
                     enable_refit=enable_refit,
-                    enable_preview=enable_preview,
                     enable_all_tactics=enable_all_tactics,
                     timing_cache=timing_cache,
                     workspace_size=self.max_workspace_size)

diff --git a/demo/Diffusion/utilities.py b/demo/Diffusion/utilities.py
@@ -190,7 +190,7 @@ def map_name(name):
             print("Failed to refit!")
             exit(0)
 
-    def build(self, onnx_path, fp16, input_profile=None, enable_refit=False, enable_preview=False, enable_all_tactics=False, timing_cache=None, workspace_size=0):
+    def build(self, onnx_path, fp16, input_profile=None, enable_refit=False, enable_all_tactics=False, timing_cache=None, workspace_size=0):
         print(f"Building TensorRT engine for {onnx_path}: {self.engine_path}")
         p = Profile()
         if input_profile:
@@ -200,10 +200,6 @@ def build(self, onnx_path, fp16, input_profile=None, enable_refit=False, enable_
 
         config_kwargs = {}
 
-        config_kwargs['preview_features'] = [trt.PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
-        if enable_preview:
-            # Faster dynamic shapes made optional since it increases engine build time.
-            config_kwargs['preview_features'].append(trt.PreviewFeature.FASTER_DYNAMIC_SHAPES_0805)
         if workspace_size > 0:
             config_kwargs['memory_pool_limits'] = {trt.MemoryPoolType.WORKSPACE: workspace_size}
         if not enable_all_tactics:
@@ -1201,7 +1197,6 @@ def add_arguments(parser):
     parser.add_argument('--build-static-batch', action='store_true', help="Build TensorRT engines with fixed batch size.")
     parser.add_argument('--build-dynamic-shape', action='store_true', help="Build TensorRT engines with dynamic image shapes.")
     parser.add_argument('--build-enable-refit', action='store_true', help="Enable Refit option in TensorRT engines during build.")
-    parser.add_argument('--build-preview-features', action='store_true', help="Build TensorRT engines with preview features.")
     parser.add_argument('--build-all-tactics', action='store_true', help="Build TensorRT engines using all tactic sources.")
     parser.add_argument('--timing-cache', default=None, type=str, help="Path to the precached timing measurements to accelerate build.")
 

diff --git a/demo/HuggingFace/GPT2/GPT2ModelConfig.py b/demo/HuggingFace/GPT2/GPT2ModelConfig.py
@@ -51,7 +51,7 @@ def add_args(parser: argparse.ArgumentParser) -> None:
         network_group.add_argument(
             "--num-beams", type=int, default=1, help="Enables beam search during decoding."
         )
-        
+
         network_group.add_argument(
             "--fp16", action="store_true", help="Enables fp16 TensorRT tactics."
         )
@@ -84,7 +84,7 @@ def add_benchmarking_args(parser: argparse.ArgumentParser) -> None:
 
 
 class GPT2ModelTRTConfig(NNConfig):
-    TARGET_MODELS = ["gpt2", "gpt2-medium", "gpt2-large", "gpt2-xl", "EleutherAI/gpt-j-6B"]
+    TARGET_MODELS = ["gpt2", "gpt2-medium", "gpt2-large", "gpt2-xl", "EleutherAI/gpt-j-6b"]
     NETWORK_DECODER_SEGMENT_NAME = "gpt2_decoder"
     NETWORK_SEGMENTS = [NETWORK_DECODER_SEGMENT_NAME]
     NETWORK_FULL_NAME = "full"

diff --git a/docker/centos-7.Dockerfile b/docker/centos-7.Dockerfile
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-ARG CUDA_VERSION=12.0.1
+ARG CUDA_VERSION=12.1.1
 
 FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-centos7
 LABEL maintainer="NVIDIA CORPORATION"
@@ -60,7 +60,11 @@ RUN if [ "${CUDA_VERSION}" = "10.2" ] ; then \
         libnvinfer-lean-devel-=${v} libnvinfer-vc-plugin8-=${v} libnvinfer-vc-plugin-devel-=${v} \
         libnvinfer-headers-devel-=${v} libnvinfer-headers-plugin-devel-=${v}; \
 else \
-    v="${TRT_VERSION}-1.cuda${CUDA_VERSION%.*}" &&\
+    ver="${CUDA_VERSION%.*}" &&\
+    if [ "${ver%.*}" = "12" ] ; then \
+        ver="12.0"; \
+    fi &&\
+    v="${TRT_VERSION}-1.cuda${ver}" &&\
     yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo &&\
     yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} \
         libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} \

diff --git a/docker/ubuntu-20.04-aarch64.Dockerfile b/docker/ubuntu-20.04-aarch64.Dockerfile
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-ARG CUDA_VERSION=12.0.1
+ARG CUDA_VERSION=12.2.0
 
 # Multi-arch container support available in non-cudnn containers.
 FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04
@@ -69,7 +69,11 @@ RUN apt-get install -y --no-install-recommends \
     ln -s /usr/bin/pip3 pip;
 
 # Install TensorRT. This will also pull in CUDNN
-RUN v="${TRT_VERSION}-1+cuda${CUDA_VERSION%.*}" &&\
+RUN ver="${CUDA_VERSION%.*}" &&\
+    if [ "${ver%.*}" = "12" ] ; then \
+        ver="12.0"; \
+    fi &&\
+    v="${TRT_VERSION}-1+cuda${ver}" &&\
     apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/sbsa/3bf863cc.pub &&\
     apt-get update &&\
     sudo apt-get -y install libnvinfer8=${v} libnvonnxparsers8=${v} libnvparsers8=${v} libnvinfer-plugin8=${v} \

diff --git a/docker/ubuntu-20.04.Dockerfile b/docker/ubuntu-20.04.Dockerfile
@@ -15,7 +15,7 @@
 # limitations under the License.
 #
 
-ARG CUDA_VERSION=12.0.1
+ARG CUDA_VERSION=12.1.1
 
 FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu20.04
 LABEL maintainer="NVIDIA CORPORATION"
@@ -79,7 +79,11 @@ RUN if [ "${CUDA_VERSION}" = "10.2" ] ; then \
         libnvinfer-lean-dev=${v} libnvinfer-vc-plugin8=${v} libnvinfer-vc-plugin-dev=${v} \
         libnvinfer-headers-dev=${v} libnvinfer-headers-plugin-dev=${v}; \
 else \
-    v="${TRT_VERSION}-1+cuda${CUDA_VERSION%.*}" &&\
+    ver="${CUDA_VERSION%.*}" &&\
+    if [ "${ver%.*}" = "12" ] ; then \
+        ver="12.0"; \
+    fi &&\
+    v="${TRT_VERSION}-1+cuda${ver}" &&\
     apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub &&\
     apt-get update &&\
     sudo apt-get -y install libnvinfer8=${v} libnvonnxparsers8=${v} libnvparsers8=${v} libnvinfer-plugin8=${v} \

diff --git a/python/CMakeLists.txt b/python/CMakeLists.txt
@@ -35,7 +35,7 @@ endfunction()
 # -------- CMAKE OPTIONS --------
 
 set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/${TENSORRT_MODULE}/)
-set(CPP_STANDARD 11 CACHE STRING "CPP Standard Version")
+set(CPP_STANDARD 14 CACHE STRING "CPP Standard Version")
 set(CMAKE_CXX_STANDARD ${CPP_STANDARD})
 
 if (NOT MSVC)

diff --git a/python/README.md b/python/README.md
@@ -2,6 +2,11 @@
 
 ## Installation
 
+### Set environment variables
+
+Set `TRT_OSSPATH` and `TRT_LIBPATH` environment variables to point to your OSS clone
+and TensorRT library location, respectively.
+
 ### Download pybind11
 
 Create a directory for external sources and download pybind11 into it.
@@ -19,12 +24,12 @@ git clone https://github.com/pybind/pybind11.git
 1. Get the source code from the official [python sources](https://www.python.org/downloads/source/)
 2. Copy the contents of the `Include/` directory into `$EXT_PATH/pythonX.Y/include/` directory.
 
-Example: Python 3.9
+Example: Python 3.10
 ```bash
-wget https://www.python.org/ftp/python/3.9.16/Python-3.9.16.tgz
-tar -xvf Python-3.9.16.tgz
-mkdir -p $EXT_PATH/python3.9
-cp -r Python-3.9.16/Include/ $EXT_PATH/python3.9/include
+wget https://www.python.org/ftp/python/3.10.11/Python-3.10.11.tgz
+tar -xvf Python-3.10.11.tgz
+mkdir -p $EXT_PATH/python3.10/include
+cp -r Python-3.10.11/Include/* $EXT_PATH/python3.10/include
 ```
 
 #### Add PyConfig.h
@@ -36,15 +41,22 @@ cp -r Python-3.9.16/Include/ $EXT_PATH/python3.9/include
 3. Unpack the contained `data.tar.xz` with `tar -xvf`
 4. Find `pyconfig.h` in the `./usr/include/<platform>/pythonX.Y/` directory and copy it into `$EXT_PATH/pythonX.Y/include/`.
 
+Example: Python 3.10
+```bash
+wget http://http.us.debian.org/debian/pool/main/p/python3.10/libpython3.10-dev_3.10.12-1_amd64.deb
+ar x libpython3.10-dev*.deb
+mkdir debian && tar -xf data.tar.xz -C debian
+cp debian/usr/include/x86_64-linux-gnu/python3.10/pyconfig.h python3.10/include/
+```
 
 ### Build Python bindings
 
-Use `build.sh` to generate the installable wheels for intended python version and target architecture.
+Use `build.sh` to generate the installable wheels for the intended Python version and target architecture.
 
-Example: for Python 3.9 `x86_64` wheel,
+Example: for Python 3.10 `x86_64` wheel,
 ```bash
 cd $TRT_OSSPATH/python
-TENSORRT_MODULE=tensorrt PYTHON_MAJOR_VERSION=3 PYTHON_MINOR_VERSION=9 TARGET_ARCHITECTURE=x86_64 ./build.sh
+TENSORRT_MODULE=tensorrt PYTHON_MAJOR_VERSION=3 PYTHON_MINOR_VERSION=10 TARGET_ARCHITECTURE=x86_64 ./build.sh
 ```
 
 ### Install the python wheel

diff --git a/python/build.sh b/python/build.sh
@@ -62,10 +62,10 @@ pushd ${ROOT_PATH}/python/packaging
 for dir in $(find . -type d); do mkdir -p ${WHEEL_OUTPUT_DIR}/$dir; done
 for file in $(find . -type f); do expand_vars_cp $file ${WHEEL_OUTPUT_DIR}/${file}; done
 popd
+
 cp tensorrt/tensorrt.so bindings_wheel/tensorrt/tensorrt.so
 
 pushd ${WHEEL_OUTPUT_DIR}/bindings_wheel
-
 python3 setup.py -q bdist_wheel --python-tag=cp${PYTHON_MAJOR_VERSION}${PYTHON_MINOR_VERSION} --plat-name=linux_${TARGET}
 
 popd