Proxmox 6.2 - nvidia gpu passthrough crashes after nvidia driver update

Domino

Active Member
May 17, 2020
32
8
28
56
So basically after I updated the nvidia drivers to 446 (subsquently also went back to 445 and same issue!), windows crashes with the infamous "VIDEO TDR FAILURE" BSOD)...

Now, I managed to get things going again by booting into the win10 guest in safe-mode and enabling "MSI MODE" for the GPU.

So it appears that by default the nvidia drivers dont enable msi-x, and KVM doesn't appear to like this and just causes windows to crash during boot as soon as it gets to the stage of loading the graphics driver.

Now looking into nvidia's history with MSI-X, they pretty much have never supported it, though you can see entries for it being set in their driver inf file.

Does anyone know if there is a cure as far as Proxmox/KVM are concerned to permit nvidia gpu's to work even after a driver update which effectively disables msi-x for the card. The workaround above works by booting in through safe-mode and using the MSI-util to enable msi-x for the card, and then there is no problem until the next driver update (god dont you just hate updating drivers!!!!) but rather not have to do that.

In the ideal world it would be nice if nvidia simply determined that you're in KVM and it could simply just enable that msi-x registry flag, but we all know nvidia doesnt really support kvm outside of the odd hello.

Does the above issue have anything to do with MSRS? or is this something completely new?
 
Hi Domino,

Same thing happened to me (see my post here: NVIDIA GPU Passthrough No Longer Working with Windows 10 Guest VM). I was also able to fix it by enabling MSI manually through regedit.

Mine wasn't actually caused by the driver, but appears to have been caused by the Proxmox update actually changing the Device Instance Path (MSI was still enabled on the old device instance path).

Interestingly though, you were able to access safe mode in Windows 10? Would you mind explaining how? I was not able to boot into safe mode and had to take a more "elaborate" route to boot and fix it.

The "msi-x registry flag", is that the same one as I used (MSISupported in regedit)? Always looking for additional options.
 
Yes it was the same flag 'MSISupported'.

Yes I also believe it is something to do with the latest Proxmox/KVM, only the guru devs will be able to shed some light onto this as and when they get a chance to have a closer look at this.

My setup is a scsi-passthrough-single controller plus a virt-scsi drive. After two bsod failures in a row it goes into recovery mode on the third reboot and then I enable the boot options (safe mode etc), and then it reboots and comes up with the boot options and I select 'safe boot with networking' then I get in and change the flag via the MSI-Util.exe

I guess it's best to just avoid updating nvidia drivers until the Proxmox team have had a closer look at this one. Thankfully we do have a pain-in-the-rear workaround to get us by for now.
 
I guess it's best to just avoid updating nvidia drivers until the Proxmox team have had a closer look at this one. Thankfully we do have a pain-in-the-rear workaround to get us by for now.

What's interesting is that I don't think I've ever had an issue with updating Nvidia drivers; I haven't ever explicitly paid attention to it, but I've had this VM up for about a year and I'm sure I've updated my Nvidia driver during that period. The problem only appeared after the Proxmox update. To further add credence that this isn't caused by the driver, I even rolled back to a two-week-old backup image I had taken and still was unable to boot the Windows guest.

I wonder if stock KVM/QEMU users are running into the problem...

-TorqueWrench
 
From what I gather from QEMU discussions, there's been a lot of work going on in that department, although outside of the scope of Proxmox devs, but all in all there will be knock-on discoveries, new configuration requirements etc, and much of this information doesn't get out there until people like yourself, I etc... stumble onto problems which then gets the attention of some devs be it either in the Proxmox park or Qemu stadium.

I also noticed that the USB port numbers changed, I had 4 ports reference as 1:... and then noticed none of the usb devices were being picked up, went into the gui and noticed that the usb port group was now 2:...

These reasons is partly why many tend to stick with the likes of Vmware and Hyper-V, they may not be at the cutting edge, but things tend to keep working for longer. Another reason why sometimes its better to avoid jumping onto the latest version of anything and follow the old enterprise format, "stay back as long as you can, if everything works, don't touch the foundations!"... LOL.. but then this also stifles innovation and performance upgrades and new exotic features.

I think going forward what I will do is have a nested version of Proxmox, and one duplicate VM from outside of the nest running inside the nest, and upgrade the nested Proxmox guest to see if there are any problems with my setup, if not, then I can feel a little more comfortable in upgrading the primary proxmox host... it's more to do with time really, it takes only a few minutes to do an upgrade, but a gazillion hours to fix/analyse issues... ideally what I need to do is have a backup system in place for the proxmox host, all the VMs have backups, just never got round to doing that with the host itself, I'll pencil it in for a sunday task.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!