PCIe Passthrough crashes the system

dline

Member
Oct 11, 2020
9
2
23
Russia, Moscow
dline-media.com
Hi, all!

I have the problem with PCIe Passthrough to my Windows 10 VM via Proxmox.
I tried to passthrough this device (audio card):

Bash:
02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
    Subsystem: C-Media Electronics Inc CMI8738/C3DX PCI Audio Device
    Flags: stepping, medium devsel, IRQ 255
    I/O ports at f000 [disabled] [size=256]
    Capabilities: [c0] Power Management version 2
    Kernel modules: snd_cmipci

VM see the device, but when it starts using it the hosts system crashes down. I get 0.5-1.0 second audio from the VM and the all system restarts.

My config is:
- AMD Ryzen 5 3350G with Radeon Vega Graphics
- Gigabyte X570 UD (also try is with Gigabyte B450M DS3H V2)

Kernel Version: Linux 5.4.73-1-pve
PVE: pve-manager/6.3-2/22f57405

Host configurated with:
- GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream"
- "blacklist snd_cmipci" at /etc/modprobe.d/blacklist.conf
- SVM & IOMMU are enabled in BIOS
- And modules ( vfio, vfio_iommu_type1, vfio_pci, vfio_virqfd) at /etc/modules

How can I fix this issue?
 
Last edited:
PCIe passthrough is considered experimental (especially on consumer hardware) for a reason. In general, follow general troubleshooting procedure:
  • Check (and/or post) any and all log files you can gather - dmesg, /var/log/syslog, journalctl, Windows Event Log from the guest, etc...
  • Make sure BIOS and firmware is up to date
  • Try different PCIe slots, or different hardware in the same slot to see if the issue is with the card or the motherboard
  • Check BIOS settings (hot tip: AER is responsible for PCIe error detection and recovery, if you find such a setting maybe play around with that?)
  • Potentially check IOMMU groupings (without the pcie_acs_override that is, otherwise everything will be "seperated" even though it can technically still communicate and cause issues - generally speaking, try to avoid that setting if at all possible)
 
Stefan, thank you for your response!

Here is the logs:
1. Syslog from HOST (crash time is after Jan 4 18:56:48)
2. dmesg
3. journalctl from HOST
4. Syslog from Guest (Ubuntu 20.04)

--
- BIOS version is up to date (December 2020 release)
- AER option is not visible in the BIOS :(
- playing with pcie_acs_override not helps

Some experiments with Ubuntu guest was made: audio can be playing around 30 seconds (value is random from 5 to 47 sec), but after that host crashes(

So... it's works, witch can means that cpu/motherboard/pci device can do this work, but something made system fully unstable and it's always crashes
 

Attachments

Last edited:
There is nothing in the logs AFAICT... it's starting to sound like a hardware issue then. As I said, try a different PCIe device in the same slot, or a different slot for your soundcard.
 
According to the logs, device 0000:01:00.0 is also in IOMMU group 7. Can you tell us what device that is? Are you also passing that device to the same VM?

PS: pci 0000:01:00.0: [1b21:1080] is a PCI bridge, which should in principle not cause an issue. Unless maybe it does and breaks the PCI passthrough? My experience with passing through PCI devices is much less successful than PCIe devices. Did the B85M-E motherboard also use a ASM1083/1085 PCIe to PCI Bridge? Maybe it is part of the sound card? The X570 chipset is famous for having the best and most flexible PCIe passthrough for AMD Ryzen.
 
Last edited:
According to the logs, device 0000:01:00.0 is also in IOMMU group 7. Can you tell us what device that is? Are you also passing that device to the same VM?

PS: pci 0000:01:00.0: [1b21:1080] is a PCI bridge, which should in principle not cause an issue. Unless maybe it does and breaks the PCI passthrough? My experience with passing through PCI devices is much less successful than PCIe devices. Did the B85M-E motherboard also use a ASM1083/1085 PCIe to PCI Bridge? Maybe it is part of the sound card? The X570 chipset is famous for having the best and most flexible PCIe passthrough for AMD Ryzen.
avw, it was PCI bridge, yes.
 
I changed my system with next config:
- Intel Core i5-9400
- GIGABYTE B365M D3H

PCI passthrough is working, but only with 2 audio cards. (1 PCIe card & 1 PCI card). Another (third) audio card don't starts.
Got next error when starts VM:

Code:
kvm: -device vfio-pci,host=0000:04:00.0,id=hostpci0,bus=pci.0,addr=0x10: vfio 0000:04:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy
TASK ERROR: start failed: QEMU exited with code 1

Some info about devices:

Bash:
lspci -v

04:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
    Subsystem: C-Media Electronics Inc CMI8738/C3DX PCI Audio Device
    Flags: stepping, medium devsel, IRQ 16
    I/O ports at e000 [size=256]
    Capabilities: [c0] Power Management version 2
    Kernel driver in use: vfio-pci
    Kernel modules: snd_cmipci
    
09:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
    Subsystem: C-Media Electronics Inc CMI8738/C3DX PCI Audio Device
    Flags: stepping, medium devsel, IRQ 255
    I/O ports at d000 [disabled] [size=256]
    Capabilities: [c0] Power Management version 2
    Kernel driver in use: vfio-pci
    Kernel modules: snd_cmipci

0b:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
    Subsystem: C-Media Electronics Inc CMI8738/C3DX PCI Audio Device
    Flags: stepping, medium devsel, IRQ 255
    I/O ports at c000 [disabled] [size=256]
    Capabilities: [c0] Power Management version 2
    Kernel driver in use: vfio-pci
    Kernel modules: snd_cmipci

Device 09:00.0 & 0b:00.0 works fine, but 04:00.0 don't won't to pass. I tried to change it with another same device, change PCI port in MB, but reason with the 04:00.0 & 09:00.0 - are the same models, and kvm won't to passing two identical devices...

How can I fix it?
 
My opinion is - PCI passthrough is very unstable thing. It better to use USB devices.
Yes, it is very hit or miss. Some devices lie about DMA, some device lie about Function Level Reset (FLR), some devices don't (fully) comply with PCI standards...
There are just too few people that use both Linux AND PCI(e) passthrough for companies to make sure it works and for users to check beforehand whether a specific combination of motherboard and PCI device will work. Please, please, please tell us which sound cards did work for you.
 
Yes, it is very hit or miss. Some devices lie about DMA, some device lie about Function Level Reset (FLR), some devices don't (fully) comply with PCI standards...
There are just too few people that use both Linux AND PCI(e) passthrough for companies to make sure it works and for users to check beforehand whether a specific combination of motherboard and PCI device will work. Please, please, please tell us which sound cards did work for you.
For me next configuration is valid:

- Intel Core i5-9400
- GIGABYTE B365M D3H
- Audiocard C-Media 8738LX (perfect sound for this money) https://www.dns-shop.ru/product/80eab564fcd33120/vnutrennaa-zvukovaa-karta-c-media-8738lx/
- Audiocard ASUS Xonar SE (bad sound) https://www.dns-shop.ru/product/541c5b761d523332/vnutrennaa-zvukovaa-karta-asus-xonar-se/
- Audiocard Creative SB AUDIGY FX 5.1 (perfect sound for this money) https://www.dns-shop.ru/product/bd0d11a171b23120/vnutrennaa-zvukovaa-karta-creative-sb-audigy-fx-51/

NB: it's works only if different audiocard connected to PC. If 2 same devices connected the system crashes.
 
Last edited:
  • Like
Reactions: leesteken
I'm facing an issue similar than yours, In fact I was about to open a new thread with my Issue.

In my case I have a Motherboard B450 Aorus M
with two PCI-e ports.

When I use any of my GPUS on the main PCI-e the GPU passthrough works succesfully!.
But when I change any of that GPUs to the another PCI-e slot and then start the VM, the computer got freezed and crashed.

I dunno if this is a hardware Issue, or might be that PCI-e port its unable to run GPU passthrough.


But the funny thing its that If I run a Baremetal installation of Windows, Linux or Mac on the secondary PCI-e slot it works ok. :( :(
 
When I use any of my GPUS on the main PCI-e the GPU passthrough works succesfully!.
But when I change any of that GPUs to the another PCI-e slot and then start the VM, the computer got freezed and crashed.
The IOMMU groups of your motherboard are not very good for passthrough as I explain here. The secondary PCIe slot is part of the chipset groups and this does crash Proxmox because it also looses the network and disk drives. This might happen even when to ignore the groups with pcie_acs_override.
 
The IOMMU groups of your motherboard are not very good for passthrough as I explain here. The secondary PCIe slot is part of the chipset groups and this does crash Proxmox because it also looses the network and disk drives. This might happen even when to ignore the groups with pcie_acs_override.
Thanks for replying!

Yep, thats what happens me, when I start the VM the network ando drivers looses connection.

Even applying the pcie override.
In fact when I do this, the gpu plugged to the second pcie dissapear from the list. :(

I'm gonna save money for Buy a motherboard with the chipset that suggested me in the other thread.

Thanks. :)