GPU passthrough

andreviana

Member
Oct 21, 2020
8
1
23
47
Hi there,

I'm using Proxmox with GPU passthrough for a while, everything going well and now I have a new computer that is suspending the VM when trying to use the GPU:

I noticed the following errors on my syslog:

Jun 14 13:48:28 myhostname kernel: [26364.427693] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:28 myhostname kernel: [26364.427698] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:28 myhostname kernel: [26364.427765] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:48:38 myhostname kernel: [26374.719550] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:38 myhostname kernel: [26374.719555] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:38 myhostname kernel: [26374.719641] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname kernel: [26396.902033] pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: 0000:00:03.0
Jun 14 13:49:00 myhostname kernel: [26396.902039] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
Jun 14 13:49:00 myhostname kernel: [26396.902145] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000020/00000000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest

It's an Asus motherboard Z10PE-D16 WS with 4 RTX 3090.

Does someone had this error in the past?
Thanks in advance,
André
 
Last edited:
I had same issues with Asus P10S-i, it was a bugged BIOS.
The two lasts BIOS for this motherboard has same issues (4602 and 4503)

It works with this, but it's not clean, so i stuck to 4401.
You can try to upgrade or downgrade
 
Last edited:
You could try adding pci=noaer to the kernel parameters, but I'm not sure it would fix the problem and hide it instead. Have you tried changing the PCIe slot or switching two GPUs, to see if it might be a hardware issue?
 
Thank you guys for your help.
I tried the first option (vendor reset module) and didn't work.
I'm trying the pci-noaer now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!