Can't change power state from D0 to D3hot

randoomkiller

New Member
Sep 20, 2023
10
0
1
Hi there,

I have a proxmox hypervisor in which I am running a Windows VM and sometimes an Ubuntu VM
I am more frequently encountering this bug where the GPU passthrough just makes in unable to boot because of Power state issues with the GPU.
It worked well for a while, but recently I'm getting more of these errors.
Sometimes reboot fixes it but not always.

Does anyone knows why it would be and how to fix it?

dmesg :

Code:
[  215.247402] ata1.00: Enabling discard_zeroes_data
[  215.248208] vfio-pci 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[  215.268648] vfio-pci 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[  215.272123]  sda: sda1
[  215.288102] vfio-pci 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[  216.039182] device tap100i0 entered promiscuous mode
[  216.059654] fwbr100i0: port 1(fwln100i0) entered blocking state
[  216.059657] fwbr100i0: port 1(fwln100i0) entered disabled state
[  216.059693] device fwln100i0 entered promiscuous mode
[  216.059722] fwbr100i0: port 1(fwln100i0) entered blocking state
[  216.059723] fwbr100i0: port 1(fwln100i0) entered forwarding state
[  216.061771] vmbr0: port 2(fwpr100p0) entered blocking state
[  216.061773] vmbr0: port 2(fwpr100p0) entered disabled state
[  216.061801] device fwpr100p0 entered promiscuous mode
[  216.061825] vmbr0: port 2(fwpr100p0) entered blocking state
[  216.061825] vmbr0: port 2(fwpr100p0) entered forwarding state
[  216.063739] fwbr100i0: port 2(tap100i0) entered blocking state
[  216.063741] fwbr100i0: port 2(tap100i0) entered disabled state
[  216.063776] fwbr100i0: port 2(tap100i0) entered blocking state
[  216.063777] fwbr100i0: port 2(tap100i0) entered forwarding state
[  219.024153] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[  219.024177] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[  219.024184] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
[  219.024186] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
[  219.024187] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
[  219.025480] vfio-pci 0000:08:00.0: No more image in the PCI ROM
[  219.044076] vfio-pci 0000:08:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
[  220.272565] vfio-pci 0000:08:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  220.304561] vfio-pci 0000:08:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  221.032390] vfio-pci 0000:08:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  222.280386] vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
[  223.336401] vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
[  225.576288] vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting
[  229.928272] vfio-pci 0000:08:00.0: not ready 8191ms after FLR; waiting
[  238.376115] vfio-pci 0000:08:00.0: not ready 16383ms after FLR; waiting
[  255.783947] vfio-pci 0000:08:00.0: not ready 32767ms after FLR; waiting
[  290.599246] vfio-pci 0000:08:00.0: not ready 65535ms after FLR; giving up
[  290.748260] fwbr100i0: port 2(tap100i0) entered disabled state
[  290.772594] fwbr100i0: port 1(fwln100i0) entered disabled state
[  290.772707] vmbr0: port 2(fwpr100p0) entered disabled state
[  290.772773] device fwln100i0 left promiscuous mode
[  290.772775] fwbr100i0: port 1(fwln100i0) entered disabled state
[  290.795133] device fwpr100p0 left promiscuous mode
[  290.795136] vmbr0: port 2(fwpr100p0) entered disabled state
[  290.891460] ata1.00: Enabling discard_zeroes_data
[  290.930613]  sda: sda1
[  291.041966] vfio-pci 0000:08:00.1: can't change power state from D0 to D3hot (config space inaccessible)
[  291.050074]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5
[  291.783259] vfio-pci 0000:08:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  293.031205] vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
[  294.087181] vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
[  296.231158] vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting
[  300.583073] vfio-pci 0000:08:00.0: not ready 8191ms after FLR; waiting
[  309.030927] vfio-pci 0000:08:00.0: not ready 16383ms after FLR; waiting
[  327.462587] vfio-pci 0000:08:00.0: not ready 32767ms after FLR; waiting
[  362.277982] vfio-pci 0000:08:00.0: not ready 65535ms after FLR; giving up
[  363.368797] vfio-pci 0000:08:00.1: can't change power state from D0 to D3hot (config space inaccessible)
[  363.368808] vfio-pci 0000:08:00.0: can't change power state from D0 to D3hot (config space inaccessible)
 
Hi there,

I have a proxmox hypervisor in which I am running a Windows VM and sometimes an Ubuntu VM
I am more frequently encountering this bug where the GPU passthrough just makes in unable to boot because of Power state issues with the GPU.
It worked well for a while, but recently I'm getting more of these errors.
Sometimes reboot fixes it but not always.

Does anyone knows why it would be and how to fix it?

dmesg :

Code:
[  215.247402] ata1.00: Enabling discard_zeroes_data
[  215.248208] vfio-pci 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[  215.268648] vfio-pci 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[  215.272123]  sda: sda1
[  215.288102] vfio-pci 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[  216.039182] device tap100i0 entered promiscuous mode
[  216.059654] fwbr100i0: port 1(fwln100i0) entered blocking state
[  216.059657] fwbr100i0: port 1(fwln100i0) entered disabled state
[  216.059693] device fwln100i0 entered promiscuous mode
[  216.059722] fwbr100i0: port 1(fwln100i0) entered blocking state
[  216.059723] fwbr100i0: port 1(fwln100i0) entered forwarding state
[  216.061771] vmbr0: port 2(fwpr100p0) entered blocking state
[  216.061773] vmbr0: port 2(fwpr100p0) entered disabled state
[  216.061801] device fwpr100p0 entered promiscuous mode
[  216.061825] vmbr0: port 2(fwpr100p0) entered blocking state
[  216.061825] vmbr0: port 2(fwpr100p0) entered forwarding state
[  216.063739] fwbr100i0: port 2(tap100i0) entered blocking state
[  216.063741] fwbr100i0: port 2(tap100i0) entered disabled state
[  216.063776] fwbr100i0: port 2(tap100i0) entered blocking state
[  216.063777] fwbr100i0: port 2(tap100i0) entered forwarding state
[  219.024153] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[  219.024177] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[  219.024184] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
[  219.024186] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
[  219.024187] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
[  219.025480] vfio-pci 0000:08:00.0: No more image in the PCI ROM
[  219.044076] vfio-pci 0000:08:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
[  220.272565] vfio-pci 0000:08:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  220.304561] vfio-pci 0000:08:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  221.032390] vfio-pci 0000:08:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  222.280386] vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
[  223.336401] vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
[  225.576288] vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting
[  229.928272] vfio-pci 0000:08:00.0: not ready 8191ms after FLR; waiting
[  238.376115] vfio-pci 0000:08:00.0: not ready 16383ms after FLR; waiting
[  255.783947] vfio-pci 0000:08:00.0: not ready 32767ms after FLR; waiting
[  290.599246] vfio-pci 0000:08:00.0: not ready 65535ms after FLR; giving up
[  290.748260] fwbr100i0: port 2(tap100i0) entered disabled state
[  290.772594] fwbr100i0: port 1(fwln100i0) entered disabled state
[  290.772707] vmbr0: port 2(fwpr100p0) entered disabled state
[  290.772773] device fwln100i0 left promiscuous mode
[  290.772775] fwbr100i0: port 1(fwln100i0) entered disabled state
[  290.795133] device fwpr100p0 left promiscuous mode
[  290.795136] vmbr0: port 2(fwpr100p0) entered disabled state
[  290.891460] ata1.00: Enabling discard_zeroes_data
[  290.930613]  sda: sda1
[  291.041966] vfio-pci 0000:08:00.1: can't change power state from D0 to D3hot (config space inaccessible)
[  291.050074]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5
[  291.783259] vfio-pci 0000:08:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  293.031205] vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
[  294.087181] vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
[  296.231158] vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting
[  300.583073] vfio-pci 0000:08:00.0: not ready 8191ms after FLR; waiting
[  309.030927] vfio-pci 0000:08:00.0: not ready 16383ms after FLR; waiting
[  327.462587] vfio-pci 0000:08:00.0: not ready 32767ms after FLR; waiting
[  362.277982] vfio-pci 0000:08:00.0: not ready 65535ms after FLR; giving up
[  363.368797] vfio-pci 0000:08:00.1: can't change power state from D0 to D3hot (config space inaccessible)
[  363.368808] vfio-pci 0000:08:00.0: can't change power state from D0 to D3hot (config space inaccessible)

The error from the proxmox UI:


no efidisk configured! Using temporary efivars disk.
kvm: vfio: Unable to power on device, stuck in D3
kvm: vfio: Unable to power on device, stuck in D3
TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name Windows.P2V -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=6b269a63-971b-4f80-a579-4b8dac17cd0d' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/tmp/100-ovmf.fd' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt' -m 25000 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=fc53f5df-13d9-41cc-b011-1140c7f41ed7' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:08:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:08:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc0d5e2cd2c' -drive 'file=/var/lib/vz/template/iso/virtio-win-0.1.240.iso,if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_S4CZNF0MB08077X,if=none,id=drive-scsi2,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi2,id=scsi2' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/dev/disk/by-id/ata-INTEL_SSDSC2CT240A4_CVKI3190023Q240DGN,if=none,id=drive-sata1,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'ide-hd,bus=ahci0.1,drive=drive-sata1,id=sata1,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=36:0D:C4:60:B8:3D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-6.1+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout