GPU Passthrough works. But after rebooting the VM shows error 43. Rebooting entire PVE fixes it, until next VM reboot.

Istria

Member
Jan 3, 2022
23
0
6
35
Hi all,

After a couple days of tinkering, reading forums, slamming my head into a wall, etc. I got GPU passthrough to a Windows 11 VM to work. (whooho!)
Actually, initially it worked on the first try follwing the "ultimate guide" on reddit, but only for a little while. And then I got a constant error 43 which took me days to solve. The solution was extracting the BIOS form the GPU using GPU-Z and loading it. I didn't try this option the whole time, because it worked without doing that the first time, so I assumed it could not be the problem. But after doing that anyway, it worked again. And kept working ever since.

But now I ran into the following issue:
The passthrough works after starting the VM for the first time. After I reboot or shutdown and restart the VM, I'm greeted again by code 43. Only rebooting the entire PVE fixes it.

On the first boot after restarting PVE, in dmesg I see:
Code:
[    5.076354] VFIO - User Level meta-driver version: 0.3
[    5.081144] vfio-pci 0000:02:00.0: vgaarb: deactivate vga console
[    5.081148] vfio-pci 0000:02:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
[    5.081286] vfio_pci: add [10de:1287[ffffffff:ffffffff]] class 0x000000/00000000
[    5.414973] vfio_pci: add [10de:0e0f[ffffffff:ffffffff]] class 0x000000/00000000

But after the VM reboot, nothing new shows up in dmesg with "vfio" in it. Is this normal?
Here is a pastebin link to the whole dmesg output:
pastebin dmesg

Any ideas? Thanks in advance! Let me know what other info or logs could be helpful to share!

101.conf
Code:
args: -cpu host,-hypervisor,kvm=off, -smbios type=0,vendor="American Megatrends Inc.",version=F2,date="06/07/2023"
balloon: 0
bios: ovmf
boot: order=sata0;net0
cores: 4
cpu: host,hidden=1
efidisk0: local-lvm:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:02:00.0,pcie=1,romfile=GK208.rom
machine: pc-q35-9.0
memory: 12288
meta: creation-qemu=9.0.2,ctime=1736987549
name: VirtualMachine1
net0: e1000=BC:24:11:5C:C6:05,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
sata0: local-lvm:vm-101-disk-1,backup=0,size=128G
scsihw: lsi
smbios1: uuid=03560274-043c-05c4-8806-bd0700080009,manufacturer=R2lnYWJ5dGUgVGVjaG5vbG9neSBDby4sIEx0ZC4=,product=SDUxME0gSCBWMg==,ver>
sockets: 1
tpmstate0: local-lvm:vm-101-disk-2,size=4M,version=v2.0
vga: none
vmgenid: e218a5b6-acc0-4b9e-b3a4-e702d3b367bc
 
Last edited:
The passthrough works after starting the VM for the first time. After I reboot or shutdown and restart the VM, I'm greeted again by code 43. Only rebooting the entire PVE fixes it.
This is not uncommon if the device does not reset properly. Sometimes there are work-arounds that can be found on the internet from people who did passthrough with the same device. Other times, one has to live with it or switch to a different device (that is known to work well with passthrough).
 
Thanks for you reply. Any suggestion on where I could find a list of well-working GPU's? Or those work-arounds you mentioned?

This GT730 is more for testing purposes. For our office, I need to build a CAD-server running 4 GPU's and 4 VM's running AutoCAD where people can RDP into to do CAD work.
For this I was looking into GT710, GT1030 or GTX1050 (we only do very light 3d modelling). The GT730 works perfectly fine for us. I never see it peak above 50% core usage and 600MB VRAM.
 
Thanks for you reply. Any suggestion on where I could find a list of well-working GPU's? Or those work-arounds you mentioned?
I switched to AMD GPUs long ago because of NVidia driver issues within VMs and have reported on working GPUs before: https://forum.proxmox.com/threads/any-recommendations-on-“gaming”-gpu-for-vm.160446/post-737669
This GT730 is more for testing purposes. For our office, I need to build a CAD-server running 4 GPU's and 4 VM's running AutoCAD where people can RDP into to do CAD work.
For this I was looking into GT710, GT1030 or GTX1050 (we only do very light 3d modelling). The GT730 works perfectly fine for us. I never see it peak above 50% core usage and 600MB VRAM.
This problem is not Proxmox specific and work-arounds might be found elsewhere as KVM/QEMU and VFIO are standard Linux technologies. Or maybe someone here who knows one for GT730 might see this thread.
 
Thanks for you help so far. We have ordered a new PC (with multiple PCIe x16 slots) on which I will continue my testing.
The current test setup has worked fine this week. Now it even started surviving reboots again without having to reboot the pve host.

EDIT:
Just wondering about the following:
Everywhere I read that you always have to passthrough both the GPU and the audio part of it. They used to be also in the same IMMOU group (12).
After adding the BIOS file to the GPU, I noticed the audio device was now in IMMOU 13, while the GPU part was still in IMMOU 12. And it only works now if I only passthrough the GPU and NOT the audio part of it. When I passthrough the GPU with "all functions" on, or pass them both through separately, I get a code43 on the GPU.
 
Last edited:
Everywhere I read that you always have to passthrough both the GPU and the audio part of it. They used to be also in the same IMMOU group (12).
After adding the BIOS file to the GPU, I noticed the audio device was now in IMMOU 13, while the GPU part was still in IMMOU 12.
The IOMMU groups are determined by the motherboard (and it's BIOS) and the PCIe layout. Usually you pass the whole GPU device (which is multi-function).
What do you mean by "adding the BIOS file to the GPU"?
The only way (that I know) is to break the IOMMU groups is the pcie_acs_overrride=downstream,multifunction, which usually splits the VGA function and the audio function in separate groups. Did you use that (what is the output of cat /proc/cmdline)?
And it only works now if I only passthrough the GPU and NOT the audio part of it. When I passthrough the GPU with "all functions" on, or pass them both through separately, I get a code43 on the GPU.
I cannot explain that. I don't have that hardware and I don't use those Windows drivers (that return the very gemeric code 43).