Hello All
I'm facing same issue. First start with GPU passthrough worked well, then I installed AMD drivers on Windows VM and shutdown the VM.
Now the VM doesn't boot and I'm getting :
Code:
Jan 11 19:45:57 pve kernel: [173901.567280] vfio-pci 0000:0a:00.0: Refused to change power state, currently in D3
Jan 11 19:45:58 pve kernel: [173902.383317] vfio-pci 0000:0a:00.0: timed out waiting for pending transaction; performing function level reset anyway
Jan 11 19:45:59 pve kernel: [173903.631345] vfio-pci 0000:0a:00.0: not ready 1023ms after FLR; waiting
Jan 11 19:46:00 pve kernel: [173904.687391] vfio-pci 0000:0a:00.0: not ready 2047ms after FLR; waiting
Jan 11 19:46:03 pve kernel: [173906.895475] vfio-pci 0000:0a:00.0: not ready 4095ms after FLR; waiting
Jan 11 19:46:07 pve kernel: [173911.247530] vfio-pci 0000:0a:00.0: not ready 8191ms after FLR; waiting
Jan 11 19:46:15 pve kernel: [173919.695770] vfio-pci 0000:0a:00.0: not ready 16383ms after FLR; waiting
Jan 11 19:46:34 pve kernel: [173937.872270] vfio-pci 0000:0a:00.0: not ready 32767ms after FLR; waiting
...
0a:00 is the PCI id of my vga card.
I can't figure out how to change D3 State of the device.
I'm using a
X399 AORUS PRO as MB for my Proxmox server.
I can't find any information in
https://pve.proxmox.com/wiki/Pci_passthrough related to that kind of issue.
edit : Ok, I've edited GRUB to Disable power management of all PCIe ports with :
And restarted server. It seems working when started VM. When I rebooted VM, all Proxmox become unresponsive (was able to ping Proxmox, but none VM was working, and everything was frozen).
So I removed that parameter..... And don't know what to do.
Last errors in logs :
Code:
Jan 13 23:13:05 pve kernel: [ 218.821350] vfio-pci 0000:0a:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
Jan 13 23:18:20 pve kernel: [ 533.620197] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [ 533.869587] vfio-pci 0000:0a:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jan 13 23:18:20 pve kernel: [ 534.115141] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [ 534.238700] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [ 534.485595] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [ 534.499839] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0a:00.0 address=0x10369b0b60]
Jan 13 23:18:21 pve kernel: [ 534.624183] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:21 pve kernel: [ 534.753249] vfio-pci 0000:0a:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jan 13 23:18:21 pve kernel: [ 534.753768] vfio-pci 0000:0a:00.0: vfio_bar_restore: reset recovery - restoring BARs
...
EDIT 2021 : It has been confirmed my issue was caused by 2 wrong things :
- My MB have a setting called "IOMMU" (in Chipset Tab). That setting use "Auto" as default Value. For that MB, Auto = Disabled. I had to explicitly set it to "Enabled".
- My GPU was 'RX5700 XT' from AMD. And it is not compatible with VFIO because the kernel module of that GPU is badly coded and is not capable of resetting hardware device into a state where VFIO can be supported : https://github.com/gnif/vendor-reset
- The Linux Kernel of Proxmox needed an update to fix previous issue regarding D3 state. But even with current update, I'm blocked because of previous point.