[SOLVED] vm crashes with - vfio_bar_restore: reset recovery - restoring BARs

kryptys100

New Member
Jul 26, 2022
7
0
1
hello everyone!

I am trying to pass through a 5700xt. For whatever reason proxmox boots normally and runs a vm successfully with the gpu passed through. then when I go through the OpenCore bootloader the VM crashes, giving me the error: vfio_bar_restore: reset recovery - restoring BARs

I'm fairly new to all of this, I could get it to boot previously, then all of the sudden this happened, with no changes whatsoever.
Any help is greatly appreciated!

my grub config is like this:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream,multifunction efifb:off kvm.ignore_msrs=1"

this is the syslog I'm given:
Code:
Jul 25 18:24:18 pve qm[1586]: <root@pam> starting task UPID:pve:00000636:00001101:62DF2622:qmstart:101:root@pam:
Jul 25 18:24:18 pve qm[1590]: start VM 101: UPID:pve:00000636:00001101:62DF2622:qmstart:101:root@pam:
Jul 25 18:24:18 pve kernel: amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
Jul 25 18:24:18 pve kernel: Console: switching to colour dummy device 80x25
Jul 25 18:24:18 pve kernel: amdgpu 0000:06:00.0: amdgpu: Fail to disable thermal alert!
Jul 25 18:24:18 pve kernel: [drm] free PSP TMR buffer
Jul 25 18:24:18 pve kernel: [drm] amdgpu: ttm finalized
Jul 25 18:24:18 pve kernel: vfio-pci 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Jul 25 18:24:19 pve systemd[1]: Created slice qemu.slice.
Jul 25 18:24:19 pve systemd[1]: Started 101.scope.
Jul 25 18:24:19 pve systemd-udevd[1595]: Using default interface naming scheme 'v247'.
Jul 25 18:24:19 pve systemd-udevd[1595]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jul 25 18:24:19 pve kernel: device tap101i0 entered promiscuous mode
Jul 25 18:24:19 pve systemd-udevd[1595]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jul 25 18:24:19 pve systemd-udevd[1594]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jul 25 18:24:19 pve systemd-udevd[1594]: Using default interface naming scheme 'v247'.
Jul 25 18:24:19 pve systemd-udevd[1595]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jul 25 18:24:19 pve kernel: vmbr0: port 2(fwpr101p0) entered blocking state
Jul 25 18:24:19 pve kernel: vmbr0: port 2(fwpr101p0) entered disabled state
Jul 25 18:24:19 pve kernel: device fwpr101p0 entered promiscuous mode
Jul 25 18:24:19 pve kernel: vmbr0: port 2(fwpr101p0) entered blocking state
Jul 25 18:24:19 pve kernel: vmbr0: port 2(fwpr101p0) entered forwarding state
Jul 25 18:24:19 pve kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Jul 25 18:24:19 pve kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
Jul 25 18:24:19 pve kernel: device fwln101i0 entered promiscuous mode
Jul 25 18:24:19 pve kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Jul 25 18:24:19 pve kernel: fwbr101i0: port 1(fwln101i0) entered forwarding state
Jul 25 18:24:19 pve kernel: fwbr101i0: port 2(tap101i0) entered blocking state
Jul 25 18:24:19 pve kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jul 25 18:24:19 pve kernel: fwbr101i0: port 2(tap101i0) entered blocking state
Jul 25 18:24:19 pve kernel: fwbr101i0: port 2(tap101i0) entered forwarding state
Jul 25 18:24:21 pve kernel: vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
Jul 25 18:24:21 pve kernel: vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
Jul 25 18:24:21 pve kernel: vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x25@0x400
Jul 25 18:24:21 pve kernel: vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
Jul 25 18:24:21 pve kernel: vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
Jul 25 18:24:23 pve qm[1586]: <root@pam> end task UPID:pve:00000636:00001101:62DF2622:qmstart:101:root@pam: OK
Jul 25 18:24:55 pve QEMU[1608]: kvm: vfio: Unable to power on device, stuck in D3
Jul 25 18:24:55 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:55 pve QEMU[1608]: kvm: vfio: Unable to power on device, stuck in D3
Jul 25 18:24:55 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:57 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:57 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs
Jul 25 18:24:58 pve kernel: vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs

thank you in advance for all of your help!
 
It's very hard to debug such things. PCIe passthrough is still not production ready on every possible hardware. There is hardware that works really great but often, it is not. You will get crashes of guest and host, strange behaviour, lockups etc. In my experience, I tried a lot of different hardware and eventually it worked: modern hardware works better than older ones, whereare NICs working better than GPUs and so on. Every kernel also introduces some new features and sometimes also bugs, so you also need to iterate over kernels and there is unfortunately not "one solution" to very problem.
 
Ok, that’s a very good answer and I’m very appreciative, but I do have one more thought! I did a little more digging and found out that i can get it booted with the gpu passthrough working flawlessly if I:
1) reset nvram (running mac os on this machine by the way)
2) changing the smbios

I’m not sure if that helps at all but I just thought I’d mention it
 
Ok, that’s a very good answer and I’m very appreciative, but I do have one more thought! I did a little more digging and found out that i can get it booted with the gpu passthrough working flawlessly if I:
1) reset nvram (running mac os on this machine by the way)
2) changing the smbios

I’m not sure if that helps at all but I just thought I’d mention it
Yes, I needed to this also once. Problem with that was also just booting up different VMs (with different OS and drivers) but the same passthrough card. Something it still fucked up and lead to a non-working card and I needed to reboot to resolve this.

If you found a solution that works for, that's great!
 
my grub config is like this:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream,multifunction efifb:off kvm.ignore_msrs=1"
amd_iommu=on is not needed because it is on by default. Do you really need pcie_acs_override=downstream,multifunction (but we can sort that out later)? efifb:off does nothing. You probably meant video=efifb:off but that won't help with kernel 5.15 and later.

I think the actual problem is: AMD Radeon 5000-series has known reset issues. Have you tried vendor-reset? I used this guide.
 
Alright! I’ll give it a try! I’ve tried to use it before and it returned some weird error with the
Code:
dkms install .
command, but i’d be more than happy to give it another go

And thanks for the advice with the
Code:
pcie_acs_override=downstream,multifunction
I wasn’t entirely sure what it did, glad to know that it makes no difference lol, I’ll get rid of those!

I’ll get back to you about my results
 
Alright! I’ll give it a try! I’ve tried to use it before and it returned some weird error with the
Code:
dkms install .
command, but i’d be more than happy to give it another go
I can't help without any information about what went wrong like an error message.
Did you install everything required for build it (as per the guide)?
And thanks for the advice with the pcie_acs_override=downstream,multifunction I wasn’t entirely sure what it did, glad to know that it makes no difference lol, I’ll get rid of those!
That particular one might actually be needed, maybe, depending on your motherboard and which PCIe slot you use for the device.
 
I can't help without any information about what went wrong like an error message.
Did you install everything required for build it (as per the guide)?
Yes I did, it returns with the same error now,
Code:
DKMS: add completed.
Error! Your kernel headers for kernel 5.13.19-6-pve cannot be found.
Please install the linux-headers-5.13.19-6-pve package.
or use the --kernelsourcedir option to tell DKMS where it's located
any thoughts?
 
Code:
DKMS: add completed.
Error! Your kernel headers for kernel 5.13.19-6-pve cannot be found.
Please install the linux-headers-5.13.19-6-pve package.
or use the --kernelsourcedir option to tell DKMS where it's located
any thoughts?
Yes, install the right kernel headers: apt-get update followed by apt-get install pve-headers-5.13.19-6-pve. It appears that it (also) wants to build the module for older kernels that you have installed, which is fine. (Or apt-get dist-upgrade your system, reboot and remove pve-kernel-5.13.)
 
That seemed to solve the problem! Thank you so so much, I’ve been stuck on that for months! I’m very appreciative!