[SOLVED] Help with single GPU Passthrough

Ventority

Member
Jan 5, 2023
33
0
6
Hey there,
I am new to Promxmox and I try to set up a Windows VM. It's just to experiment with it and nothing too serious so if I can't fix the problem, it's not that big of a deal. I currently only have one GPU in my PC and I want to pass it through to the VM. I know, that's not recommended, but I wanted to try it anyway. I fixed a couple of errors I encountered during the setup but now I'm stuck with the following error when I run my VM:
kvm: vfio: Unable to power on device, stuck in D3.
I googled around and I found no solution on how to fix it. I found some threads where people fixed it but none of the solutions worked for me.

When I run dmesg I get the following error.
[ 71.518070] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258 [ 71.518089] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x19@0x900 [ 71.518096] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c [ 71.518098] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00 [ 71.518100] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00 [ 71.541917] vfio-pci 0000:06:00.1: enabling device (0000 -> 0002) [ 71.542061] vfio-pci 0000:06:00.1: vfio_ecap_init: hiding ecap 0x25@0x160 [ 72.798528] vfio-pci 0000:06:00.1: vfio_bar_restore: reset recovery - restoring BARs [ 72.830345] vfio-pci 0000:06:00.0: vfio_bar_restore: reset recovery - restoring BARs [ 73.558255] vfio-pci 0000:06:00.0: timed out waiting for pending transaction; performing function level reset anyway [ 74.806019] vfio-pci 0000:06:00.0: not ready 1023ms after FLR; waiting [ 75.862367] vfio-pci 0000:06:00.0: not ready 2047ms after FLR; waiting [ 77.942241] vfio-pci 0000:06:00.0: not ready 4095ms after FLR; waiting [ 82.294232] vfio-pci 0000:06:00.0: not ready 8191ms after FLR; waiting [ 90.742323] vfio-pci 0000:06:00.0: not ready 16383ms after FLR; waiting [ 107.894543] vfio-pci 0000:06:00.0: not ready 32767ms after FLR; waiting [ 142.710985] vfio-pci 0000:06:00.0: not ready 65535ms after FLR; giving up [ 143.252159] vfio-pci 0000:06:00.1: can't change power state from D0 to D3hot (config space inaccessible) [ 143.991015] vfio-pci 0000:06:00.0: timed out waiting for pending transaction; performing function level reset

Maybe someone here has an idea on how to fix it?
 
UPDATE:
I fixed this error by simply adding the sound card of the gpu as well to the pcie devices. But now I have a different problem:

TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name 'Win10SSD,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=a4637879-c93b-4895-a08c-7b027243ff6f' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/pve/vm-100-disk-0,size=540672' -smp '6,sockets=1,cores=6,maxcpus=6' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/100.vnc,password=on' -no-hpet -cpu qemu64 -m 8192 -object 'memory-backend-ram,id=ram-node0,size=8192M' -numa 'node,nodeid=0,cpus=0-5,memdev=ram-node0' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=471fffcd-24f0-40d6-8cf8-a5ae52ad61a7' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:06:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on,romfile=/usr/share/kvm/GA106.rom' -device 'vfio-pci,host=0000:06:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:5a80d71b850' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/dev/disk/by-id/ata-CT500MX500SSD1_1814E13541F2,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=CA:F0:60:F7:3D:C4,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024,bootindex=100' -rtc 'driftfix=slew,base=localtime' -machine 'accel=tcg,type=pc-q35-7.1+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout

I didn't find much about the error on google. It's kind of weird and I think it has something to do with the GPU passthrough, because if I start the VM without the pcie devices linked, it starts just fine. But without a gpu of course. Is there a stupid mistake I'm making or will it just not work?
 
If you get a timeout without any error messages in the system logs, then it's because of the memory. How much of your system's memory are you giving to the VM? Passthrough requires that all VM memory is pinned into physical host memory (and therefore ballooning also does not work). Try starting the VM with much less memory, like 2 or 3GB.
If that does not resolve the issue, please share information about your hardware and the VM configuration file.
 
Hey, thank you for the advice. Unfortunately, the vm still does not boot with 2GB of memory. My host has 24GB of Ram, combined with a r5 1600x and a rtx 3050 on an ASRock b450m Pro 4. The VM has a physical ssd as it's boot drive passed through. Everything works fine when I remove the pcie passthrough. I'm currentl getting this error when executing dmesg:
[ 259.312710] vfio-pci 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible) [ 259.313588] vfio-pci 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible) [ 259.313594] vfio-pci 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible) [ 260.349952] vfio-pci 0000:06:00.0: invalid power transition (from D3cold to D3hot) [ 260.349961] vfio-pci 0000:06:00.1: can't change power state from D0 to D3hot (config space inaccessible)
Maybe that helps?

This is the content of my config file:
bios: ovmf
boot: order=net0
cores: 6
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:06:00,pcie=1,romfile=GA106.rom,x-vga=1
hostpci1: 0000:06:00,pcie=1,x-vga=1
kvm: 1
machine: pc-q35-7.1
memory: 2048
meta: creation-qemu=7.1.0,ctime=1673022643
name: Win10SSD
net0: virtio=CA:F0:60:F7:3D:C4,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
sata0: /dev/disk/by-id/ata-CT500MX500SSD1_1814E13541F2,size=488386584K
scsihw: virtio-scsi-single
smbios1: uuid=a4637879-c93b-4895-a08c-7b027243ff6f
sockets: 1
vmgenid: 471fffcd-24f0-40d6-8cf8-a5ae52ad61a7
 
Hey, thank you for the advice. Unfortunately, the vm still does not boot with 2GB of memory. My host has 24GB of Ram, combined with a r5 1600x and a rtx 3050 on an ASRock b450m Pro 4.
Sorry, I missed that your first post already contained an error message. This is not a memory issue because there are error messages.
This is the content of my config file:
...
hostpci0: 0000:06:00,pcie=1,romfile=GA106.rom,x-vga=1
hostpci1: 0000:06:00,pcie=1,x-vga=1
...
Don't passthrough all functions of the GPU (06:00.0 and 06:00.1). Just passthrough 06:00.0 with All Functions, PCI-Express, Primary GPU and ROM-File (if that's the actual/patched GPU ROM) by removing the hostpci1 line from this file.
 
Didn't do anything. The ROM is extracted using GPU-Z in windows but I didn't find a way to patch it for 30 series cards.
 
Didn't do anything. The ROM is extracted using GPU-Z in windows but I didn't find a way to patch it for 30 series cards.
I have no experience with NVidia cards but I would expect a modern GPU to reset/work without a ROM-file.
You probably need this work-around, since your system has only a single GPU.
But just as import: fix the VM configuration by removing hostpci1 and only using hostpci0.
 
I've read through the whole thread and tried the fixes they provided but no luck. I fixed the VM config but it does nothing. When I boot the pc, I see the "loading initial ramdisk" message (idk how this stage is called) and that's it. So my System doesn't use the gpu, right? When I try to start the vm, the framebuffer gets reset and the monitor has no signal anymore.
 
Last edited:
I've read through the whole thread and tried the fixes they provided but no luck. I fixed the VM config but it does nothing.
You do need those two changes, but maybe also a BIOS update. Some AMD AGESA version simply break passthrough. AGESA 1.0.0.3 ABB worked best for my ASROCK X470, everything before that (after AGESA 1.0.0.6 old versioning) and after that broke passthrough. Until AGESA Combo V2 PI 1.2.0.7, which started working again. For your motherboard, I suggest trying version 3.90 or 5.40 [Beta] or the latest 5.70. Since you have and old Ryzen CPU, try 3.90 first and stick to it if it works. What BIOS version are you currently on?
 
That's what I thought as well. Yesterday, I updated from 5.17 (i think) to 5.70 and none of the versions worked. I will try to downgrade to 3.9 in a minute. Which settings are must haves? the only things I changed are enabling above 4g encoding, iommu and acs and disabling csm.
 
That's what I thought as well. Yesterday, I updated from 5.17 (i think) to 5.70 and none of the versions worked. I will try to downgrade to 3.9 in a minute. Which settings are must haves? the only things I changed are enabling above 4g encoding, iommu and acs and disabling csm.
Start with Optimized Default and try disabling Above 4G Decoding, enabling it tends to cause problems. IOMMU and ACS should be enabled. Also disable Resizable BAR (or whatever it's called).
 
I just downgraded to 3.90 and I still have the same problem, so I guess there wouldn't be a difference when I update to 5.40 or anything else. Maybe I'm just unlucky and will not be able to pass through the GPU :/
 
When running dmesg, I get a slightly different error message now:
[ 471.329658] vfio-pci 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible) [ 471.329757] vfio-pci 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible) [ 472.080147] vfio-pci 0000:06:00.0: timed out waiting for pending transaction; performing function level reset anyway ... [ 543.153717] vfio-pci 0000:06:00.0: invalid power transition (from D3cold to D3hot)
Any Idea what that means?
 
Last edited:
The GPU is not resetting properly. This is either a GPU problem or a motherboard BIOS problem. Maybe there's a work-around for this particular GPU (but I don't know it). Try another GPU or try passthrough of another device (which is alone in a IOMMU group) to see if it is only happening with this device. Or use another GPU t boot the systen to work-around the "single GPU" problems but your Micro-ATX motherboard probably does not support booting from the x4 PCIe slot.
 
I may be able to get a 710 but only for a pcie x1 slot in a couple of days. I will try it and give an update. Where can I search for a workaround? Google isn't that helpfull :/
 
I may be able to get a 710 but only for a pcie x1 slot in a couple of days. I will try it and give an update.
Or try any other device, it does not have to be a GPU.
Where can I search for a workaround? Google isn't that helpfull :/
I don't know. I have no experience with NVidia GPUs because NVIdia used to break passthrough on purpose.
 
I'll try it tomorrow. What about installing nvidia drivers? Every how-to i saw didn't mention it but do I have to install the drivers?
 
Tried it with a different PCI device and Windows boots (or at least I get as far as the Windows Boot screen). But still no luck with the GPU :/
 
Just got it finally working by myself :)
I disabled all checks for the PCIe passthrough in the config and it booted and the RTX 3050 was recognized in windows. Then I enabled one box at a time and it turns out that leaving "ROM BUS" and "All functions" disabled did the trick. Why, I can't tell, but I benchmarked the setup using Cyberpunk's built in benchmark (yes, not a real benchmarking software but it is a gaming setup so it should be enough) and there was a difference in 0.13 in avg fps so there is no loss in performance with these two things disabled. Thank you @leesteken for trying to help me, apreciate it, really :) Hopefully someone in the future can profit from this threat
 
Just got it finally working by myself :)
I disabled all checks for the PCIe passthrough in the config and it booted and the RTX 3050 was recognized in windows. Then I enabled one box at a time and it turns out that leaving "ROM BUS" and "All functions" disabled did the trick. Why, I can't tell, but I benchmarked the setup using Cyberpunk's built in benchmark (yes, not a real benchmarking software but it is a gaming setup so it should be enough) and there was a difference in 0.13 in avg fps so there is no loss in performance with these two things disabled. Thank you @leesteken for trying to help me, apreciate it, really :) Hopefully someone in the future can profit from this threat
Hi , met same problem .Could you pls tell me how to disable all checks for the PCIe passthrough in the config ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!