Hi there,
I'm using Proxmox with GPU passthrough for a while, everything going well and now I have a new computer that is suspending the VM when trying to use the GPU:
I noticed the following errors on my syslog:
Jun 14 13:48:28 myhostname kernel: [26364.427693] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:28 myhostname kernel: [26364.427698] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:28 myhostname kernel: [26364.427765] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:48:38 myhostname kernel: [26374.719550] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:38 myhostname kernel: [26374.719555] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:38 myhostname kernel: [26374.719641] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname kernel: [26396.902033] pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: 0000:00:03.0
Jun 14 13:49:00 myhostname kernel: [26396.902039] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
Jun 14 13:49:00 myhostname kernel: [26396.902145] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000020/00000000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
It's an Asus motherboard Z10PE-D16 WS with 4 RTX 3090.
Does someone had this error in the past?
Thanks in advance,
André
I'm using Proxmox with GPU passthrough for a while, everything going well and now I have a new computer that is suspending the VM when trying to use the GPU:
I noticed the following errors on my syslog:
Jun 14 13:48:28 myhostname kernel: [26364.427693] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:28 myhostname kernel: [26364.427698] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:28 myhostname kernel: [26364.427765] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:48:38 myhostname kernel: [26374.719550] pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
Jun 14 13:48:38 myhostname kernel: [26374.719555] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Jun 14 13:48:38 myhostname kernel: [26374.719641] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000040/00002000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname kernel: [26396.902033] pcieport 0000:00:03.0: AER: Uncorrected (Fatal) error received: 0000:00:03.0
Jun 14 13:49:00 myhostname kernel: [26396.902039] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
Jun 14 13:49:00 myhostname kernel: [26396.902145] pcieport 0000:00:03.0: AER: device [8086:2f08] error status/mask=00000020/00000000
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0d:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Jun 14 13:49:00 myhostname QEMU[4427]: kvm: vfio_err_notifier_handler(0000:0a:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
It's an Asus motherboard Z10PE-D16 WS with 4 RTX 3090.
Does someone had this error in the past?
Thanks in advance,
André
Last edited: