Skip to content
This repository has been archived by the owner on May 17, 2023. It is now read-only.

FFmpeg QSV Multi GPU Selection on Linux

Dmitry Rogozhkin edited this page Oct 9, 2020 · 5 revisions

FFmpeg command line has a number of options to select GPU in multi device case. Appropriate usage of these options might be tricky. This article summarizes most typical use cases trying to pay attention on the tricky points.

In the examples given in this article we will use input content which can be obtained as follows (stream resolution is 176x144):

wget https://fate-suite.libav.org/h264-conformance/AUD_MW_E.264
ffmpeg -i AUD_MW_E.264 -c:v rawvideo -pix_fmt yuv420p -y AUD_MW_E.yuv

We will consider a system with 2 Intel GPUs and provide command lines which will schedule all GPU operations within the pipeline on the specified device.

This article does not attempt to focus on the best performance, example command lines are simplified as much as possible to just expolore scheduling options. Refer to other materials on how to achieve better quality and performance with ffmpeg-qsv.

To schedule on /dev/dri/renderD128:

ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v h264_qsv -i AUD_MW_E.264 -c:v h264_qsv -y out.264

To schedule on /dev/dri/renderD129:

ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD129 -c:v h264_qsv -i AUD_MW_E.264 -c:v h264_qsv -y out.264

To schedule on /dev/dri/renderD128:

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device qsv=hw@va -c:v h264 -i AUD_MW_E.264 -c:v h264_qsv -y out.264

To schedule on /dev/dri/renderD129:

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD129 -init_hw_device qsv=hw@va -c:v h264 -i AUD_MW_E.264 -c:v h264_qsv -y out.264

To schedule on /dev/dri/renderD128:

ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 \
  -c:v h264_qsv -i AUD_MW_E.264 -vf hwdownload,format=nv12 -pix_fmt yuv420p -y out.yuv

To schedule on /dev/dri/renderD129:

ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD129 \
  -c:v h264_qsv -i AUD_MW_E.264 -vf hwdownload,format=nv12 -pix_fmt yuv420p -y out.yuv

To schedule on /dev/dri/renderD128:

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device qsv=hw@va \
  -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -c:v h264_qsv -y out.h264

To schedule on /dev/dri/renderD129:

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD129 -init_hw_device qsv=hw@va \
  -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -c:v h264_qsv -y out.h264

To schedule on /dev/dri/renderD128:

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device qsv=hw@va -filter_hw_device hw \
  -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -vf hwupload=extra_hw_frames=64,format=qsv -c:v h264_qsv -y out.h264

To schedule on /dev/dri/renderD129:

ffmpeg -init_hw_device vaapi=va:/dev/dri/renderD129 -init_hw_device qsv=hw@va -filter_hw_device hw \
  -f rawvideo -pix_fmt yuv420p -s:v 176x144 -i AUD_MW_E.yuv -vf hwupload=extra_hw_frames=64,format=qsv -c:v h264_qsv -y out.h264
Option Applies to
-hwaccel qsv QSV decoders
-hwaccel_device Does not apply to QSV
-qsv_device -hwaccel
-init_hw_device QSV encoders
-filter_hw_device QSV filters, QSV HW upload

In general, -hwaccel option actually applies to the input stream, i.e. to the 1st component in the pipeline. If the first component is QSV decoder, then it will apply to it and QSV encoder (2nd in the pipeline) will pick the same device thru the frames context. But if 1st component is not QSV decoder (as in the encoding examples where 1st component is rawvideo), then option will be simply ignored (by rawvideo), encoder won't be able to pick device from the frames context (since they will be raw system memory frames) and will work on the default device.

-hwaccel_device option is supposed to be able to select device for the -hwaccel option, but it is not actually implemented for QSV. Instead, we need to use -qsv_device.

-init_hw_device initializes HW device and adds it to the global list. QSV encoders are capable to pick devices from this list, but QSV decoders are not (historically QSV encoders were written later and implemented this path while QSV decoders did not, the latter uses their own ad-hoc path).

Finally, -filter_hw_device allows to specify device for the filters and covers HW upload case (which in case of QSV uses GPU to copy frames in system memory to video memory).

Before we will dive into specific ffmpeg command lines exploring device selection options, let's consider how we can check that selected GPU is actually loaded? For that we can use Linux perf tool which is capable to show whether tasks are running on the engines of the Intel GPUs.

List of available GPU engines is actually GPU dependent. To get it execute:

$ sudo perf list | grep i915 | grep busy
  i915/bcs0-busy/                                    [Kernel PMU event]
  i915/rcs0-busy/                                    [Kernel PMU event]
  i915/vcs0-busy/                                    [Kernel PMU event]
  i915/vecs0-busy/                                   [Kernel PMU event]
  i915_0000_03_00.0/bcs0-busy/                       [Kernel PMU event]
  i915_0000_03_00.0/rcs0-busy/                       [Kernel PMU event]
  i915_0000_03_00.0/vcs0-busy/                       [Kernel PMU event]
  i915_0000_03_00.0/vcs1-busy/                       [Kernel PMU event]
  i915_0000_03_00.0/vecs0-busy/                      [Kernel PMU event]

In this example we have a system with 2 enabled Intel GPU devices. First GPU has 4 engines and is Intel integrated GPU (it's events follow the pattern i915/<engine>-<event>). Second GPU is Intel discrete GPU and has 5 engines (event pattern i915_<pci>/<engine>-<event>). For the purpose of this article we will use only "busy" events which effectively give times during which engines were actually executing some tasks.

Now we can run the following script to monitor both GPUs activity:

events=""
# events for first GPU
events+=i915/bcs0-busy/,
events+=i915/rcs0-busy/,
events+=i915/vcs0-busy/,
events+=i915/vecs0-busy/,
# events for second GPU
events+=i915_0000_03_00.0/bcs0-busy/,
events+=i915_0000_03_00.0/rcs0-busy/,
events+=i915_0000_03_00.0/vcs0-busy/,
events+=i915_0000_03_00.0/vcs1-busy/,
events+=i915_0000_03_00.0/vecs0-busy/
#
sudo perf stat -a -I 1000 -e $events \
  /bin/bash -c "while :; do echo 'Press [CTRL+C] to stop..'; sleep 1; done"

Script will print both GPUs engines utilization (in nanoseconds during which engines were running tasks) each 1 second. For example:

    9.003608840                  0 ns   i915/bcs0-busy/
    9.003608840                  0 ns   i915/rcs0-busy/
    9.003608840         33,478,100 ns   i915/vcs0-busy/
    9.003608840                  0 ns   i915/vecs0-busy/
    9.003608840            511,984 ns   i915_0000_03_00.0/bcs0-busy/
    9.003608840        147,219,205 ns   i915_0000_03_00.0/rcs0-busy/
    9.003608840         25,717,925 ns   i915_0000_03_00.0/vcs0-busy/
    9.003608840                  0 ns   i915_0000_03_00.0/vcs1-busy/
    9.003608840         40,396,895 ns   i915_0000_03_00.0/vecs0-busy/
Press [CTRL+C] to stop..

As you can see, in this example we have tasks running on both GPUs, but not all engines are actually busy.

Running this script in parallel to ffmpeg examples given below you can check onto which GPU each workload was actually scheduled.

For mode details on Linux perf usage refer to Performance monitoring and debug w/ Linux perf.

Clone this wiki locally