PVE 5.3 (nvidia+) passthrough bug

Republicus

Well-Known Member
Aug 7, 2017
137
22
58
41
I have been troubleshooting Code 43 on a Windows 10 VM with a GTX 1060 for days and then decided to passthrough the GPU to a linux guest VM to test the Nvidia drivers there. Both system drivers are crashing the VM.

These are the same problems I had earlier when passing an older generation card without kvm=off settings.

I believe Proxmox VE is not properly hiding VM virtualization as passed in Qemu by kvm=off hv_vendor_id=proxmox via x-vga=on setting.

Note though I do see those in the command line.

Is anyone else having success with Nvidia passthrough and PVE 5.3?

My other system was downgraded to PVE 5.2 immediately after upgrading to PVE 5.3 because VMs would not pass through AMD/ATI cards that were working fine under 5.2. Is this maybe a bigger issue?

qm showcmd 100
Code:
/usr/bin/kvm -id 100 -name win10 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=62eaa93e-8848-480a-8d67-4780ef548e74' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=qcow2,id=drive-efidisk0,file=/media/ssd/images/100/vm-100-disk-1.qcow2' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_synic,hv_stimer,enforce,kvm=off' -m 4096 -object 'memory-backend-ram,id=ram-node0,size=4096M' -numa 'node,nodeid=0,cpus=0-3,memdev=ram-node0' -device 'vmgenid,guid=5abc790e-e842-4ce6-aeb6-5da64159786f' -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=00:1a.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'vfio-pci,host=01:00.0,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0' -device 'usb-host,vendorid=0x1532,productid=0x0053,id=usb0' -device 'usb-host,vendorid=0x3938,productid=0x1095,id=usb1' -chardev 'socket,path=/var/run/qemu-server/100.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8619780584e' -drive 'if=none,id=drive-ide0,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/media/ssd/images/100/vm-100-disk-0.qcow2,if=none,id=drive-scsi1,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1,rotation_rate=1,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=C6:A8:5C:F8:B1:9A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=q35' -global 'kvm-pit.lost_tick_policy=discard'

Package Versions
Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve) 
pve-manager: 5.3-8 (running version: 5.3-8/2929af8e) 
pve-kernel-4.15: 5.3-1 
pve-kernel-4.15.18-10-pve: 4.15.18-32 
pve-kernel-4.15.18-9-pve: 4.15.18-30 
corosync: 2.4.4-pve1 
criu: 2.11.1-1~bpo90 
glusterfs-client: 3.8.8-1 
ksm-control-daemon: 1.2-2 
libjs-extjs: 6.0.1-2 
libpve-access-control: 5.1-3 
libpve-apiclient-perl: 2.0-5 
libpve-common-perl: 5.0-43 
libpve-guest-common-perl: 2.0-19 
libpve-http-server-perl: 2.0-11 
libpve-storage-perl: 5.0-36 
libqb0: 1.0.3-1~bpo9 lvm2: 2.02.168-pve6 
lxc-pve: 3.1.0-1 
lxcfs: 3.0.2-2 
novnc-pve: 1.0.0-2 
proxmox-widget-toolkit: 1.0-22 
pve-cluster: 5.0-33 
pve-container: 2.0-33 
pve-docs: 5.3-1 
pve-edk2-firmware: 1.20181023-1 
pve-firewall: 3.0-17 
pve-firmware: 2.0-6 
pve-ha-manager: 2.0-6 pve-i18n: 1.0-9 pve-libspice-server1: 0.14.1-1 
pve-qemu-kvm: 2.12.1-1 
pve-xtermjs: 1.0-5 
qemu-server: 5.0-44 
smartmontools: 6.5+svn4324-1 
spiceterm: 3.0-5 
vncterm: 1.5-3 
zfsutils-linux: 0.7.12-pve1~bpo1
 
I have a GTX 750 ti that has worked since 1 year (so both 5.2 and 5.3) but my 1050 ti has ONLY worked on 5.2. As soon as 5.3 dropped, all I got was Error -43 on the 1050ti while my 750ti is working just fine.

And yes, I am using UEFI BIOS in the VM - did dump my own rom (not from techrepublics db!) and did all kind of flags, but still have no luck. Maybe I should try downgrading to 5.2 and see....but will most probably start with a fresh VM and a new dump of the rom (in case it got updated by nividias drivers?) before going that route.
 
could you try to set the machine version to 2.11 ?
for i440fx:

qm set ID -machine pc-i44fx-2.11

for q35:

qm set ID -machine pc-q35-2.11
 
could you try to set the machine version to 2.11 ?
for i440fx:

qm set ID -machine pc-i44fx-2.11

for q35:

qm set ID -machine pc-q35-2.11



I have tried both pc-i440fx-2.11 and pc-q35-2.11 with no joy. Code 43 on both in Windows

Linux attempts to load the Nvidia driver and fails (see attached pic).

These are the same behaviors I had with nvidia when it detected a virtual machine and drivers would fail prior to appending kvm=off in earlier versions of PVE.

One interesting note to add is that before converting my system over to PVE I had the same hardware and Windows 10 installed on bare-metal and an SSD. After installing PVE 5.3 onto the system on a separate disk: I created a VM and attached a USB controller and the Nvidia GTX 1060. I also directly attached the SSD the VM with the old Windows 10 installation --and the graphics card worked fine.

I decided to better use my SSD by starting from a fresh install and virtualization the disks on the SSD so I could preform backups and other PVE features.... This is when the GPU failures began.

I have tried every way I can think of to try and resolve this issue.
 

Attachments

  • gtx1060-linux-code43.jpg
    gtx1060-linux-code43.jpg
    330.6 KB · Views: 12
I have a GTX 750 ti that has worked since 1 year (so both 5.2 and 5.3) but my 1050 ti has ONLY worked on 5.2. As soon as 5.3 dropped, all I got was Error -43 on the 1050ti while my 750ti is working just fine.

And yes, I am using UEFI BIOS in the VM - did dump my own rom (not from techrepublics db!) and did all kind of flags, but still have no luck. Maybe I should try downgrading to 5.2 and see....but will most probably start with a fresh VM and a new dump of the rom (in case it got updated by nividias drivers?) before going that route.

Hopefully you can test it using a different disk in the event 5.2 does not provide any resolution.
My system running 5.2 was necessary to get my AMD GPU working again. I am not yet prepared to downgrade my second system with Nvidia GPU. I may have to, though.

I would be interested to hear your results. Its possible 5.2 has some fixes that are missing in 5.3.
PM me if you want to collaborate on discord etc.
 
Is anyone else having success with Nvidia passthrough and PVE 5.3?

I have different expirience - my secondary NVIDIA GeForce 1080 GPU passthrough works just fine on Proxmox 5.3-6 host in Windows 10 1803/1809 VM, with NVIDA drivers versions 398.11 and 417.35.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!