GPU passthrough

andreviana

Member
Oct 21, 2020
8
1
23
48
Hi there,

I'm using Proxmox with GPU passthrough for a while, everything going well and now I have a new computer that is suspending the VM when trying to use the GPU:

I noticed the following errors on my syslog:

Jun 14 13:48:28 myhostname kernel: [26364.427693] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:28 myhostname kernel: [26364.427698] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:28 myhostname kernel: [26364.427765] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:48:38 myhostname kernel: [26374.719550] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:38 myhostname kernel: [26374.719555] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:38 myhostname kernel: [26374.719641] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname kernel: [26396.902033] pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: 0000:00:03.0
Jun 14 13:49:00 myhostname kernel: [26396.902039] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
Jun 14 13:49:00 myhostname kernel: [26396.902145] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000020/00000000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest

It's an Asus motherboard Z10PE-D16 WS with 4 RTX 3090.

Does someone had this error in the past?
Thanks in advance,
André
 
Last edited:
I had same issues with Asus P10S-i, it was a bugged BIOS.
The two lasts BIOS for this motherboard has same issues (4602 and 4503)

It works with this, but it's not clean, so i stuck to 4401.
You can try to upgrade or downgrade
 
Last edited:
You could try adding pci=noaer to the kernel parameters, but I'm not sure it would fix the problem and hide it instead. Have you tried changing the PCIe slot or switching two GPUs, to see if it might be a hardware issue?
 
Thank you guys for your help.
I tried the first option (vendor reset module) and didn't work.
I'm trying the pci-noaer now.