How to combine, mix and manage sound (speakers and mics) from multiple VMs simultaneously ?

shodan · Aug 6, 2025

Hi,

As I prepare to switch my desktop to "single gpu passthrough" style hypervisor multi-desktop

I'm curious, how can I handle sound in a coherent and organized manner ?

With just one guest OS the sound system already gets quite hectic

But how do you handle sound and microphones to multiple VMs ?

Do you have one VM that does all the sound things in one,
interfacing between physical sound devices and virtual sounds from the VMs ?

In my setup, I have
VMs that are enabled one at a time (proxmox)
I have VM that just play music
and in all of this I have a voice chat with my friend that I don't want interrupted just because I
switched my GPU VM from windows to linux

Does something like "sound vGPU" exist ? can multiple VMs share one sound card ?
Will I need to add half a dozen of those 5$ usb sound cards and a analog hardware volume mixer ?
Or can this be done digitally without compromising sound, reliability and especially latency ?

I do feel this is the cutting edge of "hypervisor desktop", I don't know if that's a thing of if I'm just making it up
but it's pretty exciting regardless

shodan · Aug 6, 2025

As far as actually making it work

I figure I could run my voice chat client (mumble)
on the host on an fbdev xorg server shared with sunshine,
that does seem quite of an heavy handed solution
but running the voice chat, the most latency sensitive
on the host would give me a chance

As for the other VMs, if I already have the host handle the audio,
maybe I can make all VMs broadcast their dummy audio to network
using gstreamer sending raw audio as multicast to my LAN

Another alternative would be to run a LXC or VM with "JACK" inside of it
I am not familiar with how JACK really works, but it seems to handle
being network transparent and has all the audio primitives like mixdown and volume shaping that I might need
Although an extra layer of complexity like this might make all things audio, extra hard for me in the future

alexskysilk · Aug 6, 2025

vms are not usually outputting anything to the host audio/video, but that doesnt meant it cant be done; please describe your intended use case- there may be solutions.

shodan · Aug 6, 2025

I wouldn't say I have a "intended use case"

That's just going to be my computer now and it's going to run proxmox with all my stuff in VMs.

I have one GPU, an RTX 3060 and it will get passed around the VMs as I switch between them

I want to start the computer and have it appear to boot directly into windows and
as a user, it should look and feel identical to having window on baremetal

Then I want to push a special button somewhere and suddenly the computer is a linux desktop

And through that transition, voice chat and the music playing should somehow be uninterrupted

For audio devices I have
4 monitors each with an audio output
a usb wireless headset
and 3.5mm jack output

Which I am constantly switching between as my friends come and go

This seems like a relatively basic setup so I imagine there are people who have already figured this out ?

I think the cleanest way would be to have the audio devices handled by the proxmox host somehow
receiving and sending audio from multiple VMs and mixing/splitting the audio between them as needed

Whatever is doing the audio routing, I would have to make some systray app for windows and linux to change the audio output and volume from inside the VMs so it remains a 1-2 click operation and not a terminal command typing ordeal each time.

I think pulseaudio is still the favorite for audio over network but I'm not sure if there's any way to get windows to output audio to a pulseaudio server over the network. I experimented a little with gstreamer, just capturing the audio and sending it over the network as multicast, I managed to get latency under 100ms but still noticeable.

I heard JACK might be the tool for it, network transparent and with a focus on low latency and functionality but for me that's uncharted territory.

alexskysilk · Aug 6, 2025

shodan said:
I want to start the computer and have it appear to boot directly into windows and
as a user, it should look and feel identical to having window on baremetal

The you're doing it backwards. just run windows on the machine. Windows is perfectly capable of running vms.

shodan said:
Then I want to push a special button somewhere and suddenly the computer is a linux desktop

install wsl+whatever flavor linux you want/need. can't beat that for integration.

shodan said:
I think the cleanest way would be to have the audio devices handled by the proxmox host somehow
receiving and sending audio from multiple VMs and mixing/splitting the audio between them as needed

This is out of scope for Proxmox. its native for a Windows endpoint since thats what its for. It COULD be done with a linux DE but you specifically stated yiu want Windows so it would be backwards to.

Proxmox is a hypervisor. it's not designed or meant to replace a workstation operating system.

emunt6 · Aug 6, 2025

Hi!

Does something like "sound vGPU" exist ? can multiple VMs share one sound card ?
Will I need to add half a dozen of those 5$ usb sound cards and a analog hardware volume mixer ?
Or can this be done digitally without compromising sound, reliability and especially latency ?

It is possible, but the results: latency+scratch/lagg.

So yes, need pass through sound-card each VM for "reliability and especially latency".

If you interested the "sound vgpu / pulseaudio-passthrough" solution (-> latency+scratch/lagg) but it will work:

Code:

Proxmox host need to have soundcard with pulseaudio installed, ( this will be shared amongst the VMs - they send the audio over it ):

1, PROXMOX host config:

/etc/pulse/default.pa
------------------------------------------
load-module module-native-protocol-unix auth-group=audio socket=/tmp/pulseserver
------------------------------------------

/etc/pulse/daemon.conf
------------------------------------------
exit-idle-time=-1
------------------------------------------

/etc/pulse/client.conf
------------------------------------------
default-server = unix:/tmp/pulseserver
autospawn = no
------------------------------------------

/etc/systemd/system/pulsedaemon.service
------------------------------------------
[Unit]
Description=Sound Server

[Service]
User=pulseaudio
ExecStart=/usr/bin/pulseaudio
Restart=on-failure

[Install]
WantedBy=sound.target
------------------------------------------


$> systemctl enable pulsedaemon.service
$> systemctl restart pulsedaemon.service
$> systemctl status pulsedaemon.service


2., VIRTUAL-MACHINE(VM) config:

Add the following line to the VMs config:

/etc/pve/qemu/xxx.conf
------------------------------------------
args: -device ich9-intel-hda,bus=pci.0,addr=0x1b -device hda-micro,audiodev=hda -audiodev pa,id=hda,server=unix:/tmp/pulseserver,out.mixing-engine=off
------------------------------------------

Search

Search

How to combine, mix and manage sound (speakers and mics) from multiple VMs simultaneously ?

shodan

Active Member

shodan

Active Member

alexskysilk

Distinguished Member

shodan

Active Member

alexskysilk

Distinguished Member

emunt6

Active Member

We value your privacy