vfio-pci Refused to change power state, currently in D3

DomF

Member
Nov 7, 2017
43
3
13
52
Hi,

In my attempt to make my home server more stable I decided to update both PVE (running community release 6.0) and my Motherboard bios Asus B350 Prime Plus to the latest release. Now, I've noticed my GPU passthru. doesn't function anymore and shows the message in dmeg in relation to my nvidia graphics card address: vfio-pci Refused to change power state, currently in D3.

Any advice how to proceed?
 
Googling this message suggests it's a issue with the Motherboard bios . The new version I installed now was Version 5220 which has AGESA 1.0.0.3ABBA (I updated from Version 4022 which was on the board previously). I tried to roll back the bios to other previous releases which I was able to flash on the board using the built-in bios EZ flash tool but non of them resolved the issue D3 stuck power state with the Graphics card. I tried to re-install bios version 4022 but the tool refused saying that the .cap file isn't a proper file. I'm still looking a way to re-flash the old bios back onto the motherboard.
 
Hello All

I'm facing same issue. First start with GPU passthrough worked well, then I installed AMD drivers on Windows VM and shutdown the VM.
Now the VM doesn't boot and I'm getting :

Code:
Jan 11 19:45:57 pve kernel: [173901.567280] vfio-pci 0000:0a:00.0: Refused to change power state, currently in D3
Jan 11 19:45:58 pve kernel: [173902.383317] vfio-pci 0000:0a:00.0: timed out waiting for pending transaction; performing function level reset anyway
Jan 11 19:45:59 pve kernel: [173903.631345] vfio-pci 0000:0a:00.0: not ready 1023ms after FLR; waiting
Jan 11 19:46:00 pve kernel: [173904.687391] vfio-pci 0000:0a:00.0: not ready 2047ms after FLR; waiting
Jan 11 19:46:03 pve kernel: [173906.895475] vfio-pci 0000:0a:00.0: not ready 4095ms after FLR; waiting
Jan 11 19:46:07 pve kernel: [173911.247530] vfio-pci 0000:0a:00.0: not ready 8191ms after FLR; waiting
Jan 11 19:46:15 pve kernel: [173919.695770] vfio-pci 0000:0a:00.0: not ready 16383ms after FLR; waiting
Jan 11 19:46:34 pve kernel: [173937.872270] vfio-pci 0000:0a:00.0: not ready 32767ms after FLR; waiting
...

0a:00 is the PCI id of my vga card.
I can't figure out how to change D3 State of the device.
I'm using a X399 AORUS PRO as MB for my Proxmox server.


I can't find any information in https://pve.proxmox.com/wiki/Pci_passthrough related to that kind of issue.


edit : Ok, I've edited GRUB to Disable power management of all PCIe ports with :
Code:
pcie_port_pm=off
And restarted server. It seems working when started VM. When I rebooted VM, all Proxmox become unresponsive (was able to ping Proxmox, but none VM was working, and everything was frozen).
So I removed that parameter..... And don't know what to do.

Last errors in logs :

Code:
Jan 13 23:13:05 pve kernel: [  218.821350] vfio-pci 0000:0a:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
Jan 13 23:18:20 pve kernel: [  533.620197] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [  533.869587] vfio-pci 0000:0a:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jan 13 23:18:20 pve kernel: [  534.115141] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [  534.238700] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [  534.485595] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:20 pve kernel: [  534.499839] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0a:00.0 address=0x10369b0b60]
Jan 13 23:18:21 pve kernel: [  534.624183] AMD-Vi: Completion-Wait loop timed out
Jan 13 23:18:21 pve kernel: [  534.753249] vfio-pci 0000:0a:00.0: vfio_bar_restore: reset recovery - restoring BARs
Jan 13 23:18:21 pve kernel: [  534.753768] vfio-pci 0000:0a:00.0: vfio_bar_restore: reset recovery - restoring BARs
...



EDIT 2021 : It has been confirmed my issue was caused by 2 wrong things :
  • My MB have a setting called "IOMMU" (in Chipset Tab). That setting use "Auto" as default Value. For that MB, Auto = Disabled. I had to explicitly set it to "Enabled".
  • My GPU was 'RX5700 XT' from AMD. And it is not compatible with VFIO because the kernel module of that GPU is badly coded and is not capable of resetting hardware device into a state where VFIO can be supported : https://github.com/gnif/vendor-reset
  • The Linux Kernel of Proxmox needed an update to fix previous issue regarding D3 state. But even with current update, I'm blocked because of previous point.
 
Last edited:
  • Like
Reactions: itsmyrun
There is actually a vfio-pci module parameter disable_idle_d3 which can be set to ON

modinfo vfio-pci

edit the file in /etc/modprobe.d where you have configured vfio-pci and append
disable_idle_d3=1

reboot, then check the dmesg output, now you should see normal mention of d3

dmesg | grep -i d3
 
Last edited:
to reset the GPU between VM stop and VM start where pass through is applied

do "lspci | grep VGA -A 1" for the correct pci id's

use this script and replace XX: with the for mentioned ids

Code:
#!/bin/bash
#
#replace xx\:xx.x with the number of your gpu and sound counterpart
#
#
echo "disconnecting amd graphics"
echo "1" | tee -a /sys/bus/pci/devices/0000\:XX\:00.0/remove
echo "disconnecting amd sound counterpart"
echo "1" | tee -a /sys/bus/pci/devices/0000\:XX\:00.1/remove
echo "entered suspended state press power button to continue"
echo -n mem > /sys/power/state
echo "reconnecting amd gpu and sound counterpart"
echo "1" | tee -a /sys/bus/pci/rescan
echo "AMD graphics card sucessfully reset"
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!