[SOLVED] GPU passthrough "AER: can't recover (no error_detected callback)"

turborierer

New Member
Jun 25, 2022
9
3
3
Hi all,

I'm struggling for 2 weeks now to the the GPU passthrough running again. It worked but all of a sudden the VM started to display the message:
Code:
guest has not initialized the display (yet)

So I did a lot of variants in the GRUB file, all started with
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_acs_override=downstream,multifunction video=efifb:off video=vesa:off vfio-pci.ids=10de:1f06,10de:10f9,10de:1ada,10de:1adb vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 modprobe.blacklist=radeon,nouveau,nvidia,nvidiafb,nvidia-gpu"

Which lead to the following errors on the host:
Code:
   37.872433] nvidia-gpu 0000:02:00.3: AER: can't recover (no error_detected callback)
[   37.872436] pcieport 0000:00:02.0: AER: device recovery failed
[   37.872437] pcieport 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:02:00.0
[   37.872589] vfio-pci 0000:02:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[   37.875688] vfio-pci 0000:02:00.0:   device [10de:1f06] error status/mask=00100000/00000000
[   37.877501] vfio-pci 0000:02:00.0:    [20] UnsupReq               (First)
[   37.879289] vfio-pci 0000:02:00.0: AER:   TLP Header: 40000008 000002ff e0062000 f7f7f7f7
[   37.881334] xhci_hcd 0000:02:00.2: AER: can't recover (no error_detected callback)

Than I changed the GRUB config with the
Code:
pcie_aspm=off
command what I found somewhere in the net, and indeed the VM was booting again but now it gets stuck here:

Code:
Started Update UTMP about System Runlevel Changes

Some research pointed me to remove and reinstall gnome but after the reboot the VM stops at the exact same point.

Im at the end of my Linux knowledge and maybe someone sees the real bug.

The machine is a HP Z440 workstation with a GeForce RTX 2060 SUPER.

Thank you so much.
 
Last edited:
Notihng changed? You did no switch from pve-kernel-5.13 to version 5.15? Maybe unplug and replug the GPU (a few times to remove corrosion on the PCIe connectors)?
 
Actually I already tried a 2nd GPU with the exact same problem.
Im running pve-manager/7.2-3/c743d6c1 (running kernel: 5.15.30-2-pve).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!