This guidence explains how to reproduce speed results of YOLOv6. For fair comparison, the speed results do not contain the time cost of data pre-processing and NMS post-processing.
Download the models you want to test from the latest release.
Refer to README, install packages corresponding to CUDA, CUDNN and TensorRT version.
Here, we use Torch1.8.0 inference on V100 and TensorRT 7.2 Cuda 10.2 Cudnn 8.0.2 on T4.
To get inference speed without TensorRT on V100, you can run the following command:
python tools/eval.py --data data/coco.yaml --batch 32 --weights yolov6n.pt --task speed [--half]
- Speed results with batchsize = 1 are unstable in multiple runs, thus we do not provide the bs1 speed results.
To get inference speed with TensorRT in FP16 mode on T4, you can follow the steps below:
First, export pytorch model as onnx format using the following command:
python deploy/ONNX/export_onnx.py --weights yolov6n.pt --device 0 --simplify --batch [1 or 32]
Second, generate an inference trt engine and test speed using trtexec
:
trtexec --explicitBatch --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --buildOnly --workspace=1024 --onnx=yolov6n.onnx --saveEngine=yolov6n.trt
trtexec --fp16 --avgRuns=1000 --workspace=1024 --loadEngine=yolov6n.trt