QEMU Audio support in VNC Viewer (currently Windows only) #1478

mkupchik · 2022-06-04T09:55:31Z

This pull request implements a new feature in the VNC Viewer: support for VNC protocol extension called "QEMU audio". My testing indicates that it works quite well, including anti-jitter measures (such as audio output device starvation detection and adjusting audio playback delay accordingly). I've implemented QEMU audio support on the Windows platform only, because this is only platform I'm familiar with (and the only platform I use TigerVNC on). This is a byproduct of another project I currently work on, so I'm just sharing my work in hope that it will be eventually upstreamed and I won't have to rebase it to the subsequent versions of TigerVNC and recompile manually.

Added support for VNC protocol extension "QEMU audio" to the VNC Viewer (Windows only).

CendioOssman

Thanks for your PR! This is definitely something we are interested in.

How is the easiest way for us to go about testing this?

common/rfb/CConnection.cxx

common/rfb/CMsgWriter.cxx

CendioOssman · 2022-06-13T06:18:44Z

common/rfb/CMsgReader.cxx

+      nAudioBytesLeft -= consumed;
+    }
+    if (nAudioBytesLeft != 0)
+      return false;


This complexity is something that doesn't really fit well in CMsgReader. How large are these audio chunks? Can't we let the input stream handle the buffering?

Audio chunks may be large, there are no protocol limits besides UINT32_MAX. Bad-behaved VNC server may outgrow MAX_BUF_SIZE in common/rdr/BufferedInStream.cxx. That's why I prefer not to leave audio samples in the input buffer of BufferedInStream, and move them to circular playback buffer of Win32AudioOutput as soon as they arrive. Calls to memcpy happen in Win32AudioOutput::addSamples(), usually once, or twice if wraparound in the target circular buffer happens, and caller of that interface must provide a pointer and length of continuous chunk of source data. This continuous chunk of source data is owned by BufferedInStream.

When audio frame is fully received, Win32AudioOutput::submitSamples() makes one or two heap allocations of struct Win32AudioOutput::HdrInSlist, then calls waveOutPrepareHeader() WinAPI function (which boils down to allocation of struct OVERLAPPED on heap, creation of kernel event object and some pointer bookkeeping), then it calls waveOutWrite() WinAPI function (which boils down to submitting asyncronous I/O request to the NT kernel device driver, the audio mixer). There's no need to move bytes around in this context, asyncronous NT I/O request in flight refers to the audio samples in circular buffer owned by Win32AudioOutput which were added there earlier, by Win32AudioOutput::addSamples(). So while audio playback I/O request is still in flight, original socket buffer owned by BufferedInStream may contain different data, may be reallocated, may be empty etc.

When kernel completes asyncronous I/O request, it signals via kernel event object and awakes internal worker thread started by winmm.dll/wdmaud.drv. This thread invokes our Win32AudioOutput::waveOutCallback(). There we link struct Win32AudioOutput::HdrInSlist into slist to dispose it later, during the next call to Win32AudioOutput::submitSamples() or in the destructor of Win32AudioOutput. Before disposing Win32AudioOutput::HdrInSlist we call waveOutUnprepareHeader(), which boils down to deallocating struct OVERLAPPED on heap and destruction of kernel event object.

If a bad-behaved VNC server sends unreasonably large audio frame, then we clamp it at Win32AudioOutput::addSamples(): samples which do not fit into the circular buffer owned by Win32AudioOutput are de facto discarded, but reported as consumed to the caller, so the caller can release this space in BufferedInStream, and continue to do so until Win32AudioOutput::submitSamples() is called at the end of audio frame.

Summarizing all of the above, I think two buffers (socket I/O buffer and async playback buffer) are necessary.

So, what we need from BufferedInStream in this context of CMsgReader is an API to fetch some fresh data from socket and provide raw pointer to the continuous chunk of memory along with number of available bytes. There's an upper limit on the number of bytes we need here (nAudioBytesLeft), and there's no lower limit. We're happy to consume less than nAudioBytesLeft bytes here (especially if nAudioBytesLeft is large, i.e. this is a large audio frame). If there's partial audio sample/frame (e.g. less than 4 bytes for default audio format, 16-bit stereo), then up to 3 bytes will be left unconsumed in the BufferedInStream, for later use when more data arrives.

The double buffering sounds fine, given the varying lifetime.

But I'd still like to get rid of the complexity. Since there is a limit anyway (which seems to be about 1 MiB if I read the code correctly), then this extra complexity doesn't seem worth it.

We don't need to support the theoretical extremes. Supporting what is actually used out there is quite sufficient.

I think this complexity can be reduced by adding a new method to rdr::InStream, which takes a number of bytes wanted by caller, similar to hasData(), but unlike hasData() does not require all of that bytes to be available. Instead it just returns a pointer and length of what is available. Length may be less than requested. Also this new method must call rdr::InStream::overrun() for actual I/O on socket to happen. Then code in CMsgReader will be shorter. What do you think?

Also, I think I need to provide justification for MSGSTATE_AUDIO_DATA. It's true that entire 1 MiB of audio samples can fit in socket I/O buffer and be submitted for playback in one go. However consider ill-behaved VNC server which sends 1 GiB of audio data, so 1 MiB of these data is going to be submitted for playback and the rest (1023 MiB) is going to be discarded. How we can model this logic in the terms of rdr::InStream: "keep N bytes at the beginning of the buffer, but discard everything above N?" I think we cannot do this, rdr::InStream is designed for FIFO-style consumption of incoming data, we can't cut away samples/bytes/whatever in the middle of that buffer. Hence the compexity in CMsgReader, the new state of the FSM, and the double buffering. It's just a coincidence that double buffering fits nicely with async nature of audio I/O on Windows.

vncviewer/CConn.cxx

CendioOssman · 2022-06-13T06:21:57Z

vncviewer/Win32AudioOutput.cxx

+
+  for (rdr::U8 bits_per_sample = 16; bits_per_sample != 0; bits_per_sample -= 8) {
+    for (rdr::U8 n_channels = 2; n_channels != 0; n_channels--) {
+      for (rdr::U8 freq_index = 0; freq_index < 4; freq_index++) {


Doesn't Windows resample for us? Do we really need to try all of these? It would make things a lot simpler if we just ha a fixed format we always use.

We can always use 44100 Hz, 16 bits, stereo, this should be sufficient for most cases. This is what is tried first, and usually succeeds. However if that fails, why not try other formats? In QEMU audio extensions of VNC protocol, it's client who determines an audio format to be used, and VNC server must perform resampling if necessary.

All of this adds a lot of complexity to the code. So if it isn't needed in practice, then it's just a cost.

Have you actually seen it failing to give you the requested format?

I can't test all audio drivers for the Windows, but I'm pretty sure it will fail to work on some poor's guy audio card of VM or whatever else if we just hardcode everything to 44100 Hz, 16-bit stereo.

This reverts commit 31aa3f8.

Added support for VNC protocol extension "QEMU audio" to the VNC Viewer (Windows only).

mkupchik · 2022-06-14T10:45:54Z

VNC audio can be tested with recent versions of qemu (e.g. 7.0.0) by adding the following command line options to qemu:

for recent guests (e.g. Windows 8.1+):

-audiodev none,id=audiodev0
-device ich9-intel-hda,bus=pcie.0,addr=0x2
-device hda-output,audiodev=audiodev0
-vnc 0.0.0.0:0,audiodev=audiodev0

for older guests (e.g. Windows 7, Windows XP):

-audiodev none,id=audiodev0
-device ac97,bus=pcie.0,addr=0x2,audiodev=audiodev0
-vnc 0.0.0.0:0,audiodev=audiodev0

Windows 7 has builtin driver for Intel HD audio, but its sound quality in QEMU is terrible (even for local playback, e.g. via ALSA or PulseAudio QEMU plugin). This is a well-known problem and usually resolved by switching to emulation of AC97 and installing compatible driver for Windows 7 x64, e.g. Realtek AC97 WDM driver alcwdm.inf version 6.0.1.6305 from 06/19/2009 (Windows hardware ID is PCI\VEN_8086&DEV_2415).

CendioOssman · 2022-06-16T08:24:07Z

I'm afraid I don't have easy access to any Windows guests. I'll try to spin up a Linux machine with those arguments and see how that goes.

CendioOssman · 2022-07-01T08:10:56Z

Just a quick status update; things are hectic here on my side at the moment, with upcoming vacations, so I'm not sure when I can have a look at this again. So it's not forgotten, but I'm afraid I don't know when I can have a closer look.

Switched default output sampling rate to 48 kHz to avoid downsampling in QEMU for modern Windows guests. Also removed some code duplication in CMsgReader.

mkupchik · 2022-07-08T14:03:07Z

Here is a small update to the proposed pull request. My experiments has shown that modern Windows guests (Windows 10, Windows 11) prefer to output audio in 48 kHz, so I've changed default output rate in VNC Viewer to 48 kHz to avoid downsampling from 48 kHz to 44.1 kHz in the QEMU host code with subsequent upsampling from 44.1 kHz to 48 kHz in the kernel audio mixer driver (on the client host where VNC Viewer runs). Delay-wise and quality-wise it's better to leave resampling to the Windows guest and then just passthrough 48 kHz audio all the way down to kernel audio mixer driver on the VNC client host, I guess.

Also I've reduced complexity in CMsgReader a bit, as you requested, by moving common code to a new method, CMsgReader::readAudioData().

faust93 · 2023-01-18T17:20:48Z

Might be useful, there's a PoC with cross-platform audio support using miniaudio library based on original @mkupchik commit
https://github.com/faust93/tigervnc/commits/master (last 3 commits)
Haven't tested it with QEMU though but for me it works more or less ok under macOS and Linux with my TurboVNC fork patched for audio support
faust93/turbovnc@092b233

QEMU Audio support on Windows

31aa3f8

Added support for VNC protocol extension "QEMU audio" to the VNC Viewer (Windows only).

CendioOssman requested changes Jun 13, 2022

View reviewed changes

CendioOssman added the enhancement New feature or request label Jun 13, 2022

CendioOssman mentioned this pull request Jun 13, 2022

Sound through VNC #1211

Open

mkupchik added 3 commits June 14, 2022 13:18

Revert "QEMU Audio support on Windows"

fc2c633

This reverts commit 31aa3f8.

Merge branch 'TigerVNC:master' into master

8c032e2

QEMU Audio support on Windows

91ca066

Added support for VNC protocol extension "QEMU audio" to the VNC Viewer (Windows only).

mkupchik requested a review from CendioOssman June 14, 2022 10:32

mkupchik added 2 commits July 8, 2022 16:34

Merge branch 'TigerVNC:master' into master

7e7483d

Switched to 48 kHz output sample rate

23b2e4f

Switched default output sampling rate to 48 kHz to avoid downsampling in QEMU for modern Windows guests. Also removed some code duplication in CMsgReader.

JustAPotota mentioned this pull request Mar 7, 2024

Audio support? bk138/multivnc#237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QEMU Audio support in VNC Viewer (currently Windows only) #1478

QEMU Audio support in VNC Viewer (currently Windows only) #1478

mkupchik commented Jun 4, 2022

CendioOssman left a comment

CendioOssman Jun 13, 2022

mkupchik Jun 13, 2022 •

edited

Loading

CendioOssman Jun 16, 2022

mkupchik Jun 16, 2022 •

edited

Loading

mkupchik Jun 16, 2022 •

edited

Loading

CendioOssman Jun 13, 2022

mkupchik Jun 13, 2022 •

edited

Loading

CendioOssman Jun 16, 2022

mkupchik Jun 16, 2022

mkupchik commented Jun 14, 2022

CendioOssman commented Jun 16, 2022

CendioOssman commented Jul 1, 2022

mkupchik commented Jul 8, 2022 •

edited

Loading

faust93 commented Jan 18, 2023

QEMU Audio support in VNC Viewer (currently Windows only) #1478

Are you sure you want to change the base?

QEMU Audio support in VNC Viewer (currently Windows only) #1478

Conversation

mkupchik commented Jun 4, 2022

CendioOssman left a comment

Choose a reason for hiding this comment

CendioOssman Jun 13, 2022

Choose a reason for hiding this comment

mkupchik Jun 13, 2022 • edited Loading

Choose a reason for hiding this comment

CendioOssman Jun 16, 2022

Choose a reason for hiding this comment

mkupchik Jun 16, 2022 • edited Loading

Choose a reason for hiding this comment

mkupchik Jun 16, 2022 • edited Loading

Choose a reason for hiding this comment

CendioOssman Jun 13, 2022

Choose a reason for hiding this comment

mkupchik Jun 13, 2022 • edited Loading

Choose a reason for hiding this comment

CendioOssman Jun 16, 2022

Choose a reason for hiding this comment

mkupchik Jun 16, 2022

Choose a reason for hiding this comment

mkupchik commented Jun 14, 2022

CendioOssman commented Jun 16, 2022

CendioOssman commented Jul 1, 2022

mkupchik commented Jul 8, 2022 • edited Loading

faust93 commented Jan 18, 2023

mkupchik Jun 13, 2022 •

edited

Loading

mkupchik Jun 16, 2022 •

edited

Loading

mkupchik Jun 16, 2022 •

edited

Loading

mkupchik Jun 13, 2022 •

edited

Loading

mkupchik commented Jul 8, 2022 •

edited

Loading