Windows VM GPU passthrough bootloop/bluescreen fix

RAJ-rios

New Member
Jun 16, 2024
4
1
3
This is a write-up based on aggregation of several great posts I found online over the course of 4 days. You can find numerous instructions for GPU passthrough basics for all versions of Proxmox, I won't go into that here. This is just the last step in the process, to fix what has probably been a very frustrating problem where you see your Windows VM begin to boot on your external monitor, only for it to immediately freeze. This applies to you if see your OVMF UEFI and then a black/broken image/no signal on your external monitor, and/or if noVNC shows the VM bluescreen (due to video driver failure) or fully boot only when the GPU is removed from the passthrough list.

This post was greatly assisted by TorqeWrench's post here, which provided the foundation I needed to solve my issue and streamline the process. To recap, in brief;

Apparently there is a disconnect between Proxmox since 6.1 and Windows, which doesn't enable the flag for Message Signaled Interrupts. It appears that when the UEFI hands the GPU to the OS and the video driver loads, it stops getting proper IRQ handling.

The solution is to manually enable MSI in the registry. The GPU in this case was from Nvidia. My method is as follows:
Read through the instructions and download the necessary software in advance. Obviously back up any data you can't afford to lose, usual disclaimer blah blah.

Set the VM display to default and remove the GPU from the hardware list. This will allow you to use noVNC for the following steps.
Boot the Windows VM and run msconfig to set startup to safe mode with networking. Restart.
Run DDU to destroy any trace of the video driver. You could also use DDU to quickly set Windows Update not to automatically download drivers, or you can do it manually.
Restart (it will reboot to safe mode again)
I found a program which lets you customize the nvidia driver installer and has the option to enable the correct MSI flag in the registry upon install. Run NVCleanstall with your settings of choice, making sure to open the advance menu and enable the MSI flag. You can download your driver from the manufacturer, or select it through NVCleanstall.
Run msconfig and set it back to normal boot. restart/shut down.
Add your GPU back to the hardware list with the proper hardware flags.
Boot up and breathe a heavy sigh of relief. Make a post here if this helped you, to add visibility for others searching for help.

P.S. There are some concerns about needing to re-enable the MSI flag every time the video driver is updated. If you follow my method completely, drivers will only be installed when you manually update them, and by using NVCleanstall to install them you can manually flag MSI each time. That means there should be absolutely no unexpected downtime or need to redo this.
 
Last edited:
  • Like
Reactions: cm0002
Thanks, this fixed an issue where DWM would keep crashing with exception 0xe0464645 after you connected via RDP for the first time. Although for me it only partially fixed it. Before it would give a black screen when connecting via RDP over Guacamole AND the built-in Windows RDP client. After this only connecting via Guacamole would give the black screen and the Windows client would connect fine.

To completely fix things I also had to disable Hardware-accelerated GPU Scheduling, this should normally be in the Windows 10/11 display settings panel in the Graphics Settings sub-panel, but it was missing for me so I had to add the following key to the registry:

Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers add DWORD "HwSchMode" set to 1

For reference, on version 7.2-3
 
  • Like
Reactions: RAJ-rios
Thanks, this fixed an issue where DWM would keep crashing with exception 0xe0464645 after you connected via RDP for the first time. Although for me it only partially fixed it. Before it would give a black screen when connecting via RDP over Guacamole AND the built-in Windows RDP client. After this only connecting via Guacamole would give the black screen and the Windows client would connect fine.

To completely fix things I also had to disable Hardware-accelerated GPU Scheduling, this should normally be in the Windows 10/11 display settings panel in the Graphics Settings sub-panel, but it was missing for me so I had to add the following key to the registry:

Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers add DWORD "HwSchMode" set to 1

For reference, on version 7.2-3

Hey, thanks for the addition to my post. I have given some time before replying to see if your solution has continued to work over the weekend. I did some research into your error code and it seems a hardware fault is very likely, and this hardware fault can be mitigated by e.g. offloading GPU cycles back onto the CPU. Even if your solution has been working, I'd still suggest you check your PCI interface and cabling just to make sure they're still tight, and verify your PSU has enough power for your setup.
 
Hey, thanks for the addition to my post. I have given some time before replying to see if your solution has continued to work over the weekend. I did some research into your error code and it seems a hardware fault is very likely, and this hardware fault can be mitigated by e.g. offloading GPU cycles back onto the CPU. Even if your solution has been working, I'd still suggest you check your PCI interface and cabling just to make sure they're still tight, and verify your PSU has enough power for your setup.
Hi Again! Looking to see if you have any further insight, it did continue to work over the weekend, but the problem did return after a week or so. Black screening on RDP connections, but otherwise responds fully and RDP will resume working again after the VM (and just the VM, not even the host has to be) is restarted. I can get Event Viewer to connect to it remotely and everything which is also how I know its the same problem "The Desktop Window Manager process has exited Process exit code: 0xe0464645 Primary display device ID "RDPUDD" Chained DD)"

I have not allowed it to apply any updates except for manually reviewed security updates. I also looked into a hardware fault as you mentioned, but I've been running a continuous stress test for days now without issues and as long as I keep the same RDP session open it never drops and black screens out. So I'm going to rule out hardware failure for now

Appreciate any additional help you or someone else can give
 
Hi Again! Looking to see if you have any further insight, it did continue to work over the weekend, but the problem did return after a week or so. Black screening on RDP connections, but otherwise responds fully and RDP will resume working again after the VM (and just the VM, not even the host has to be) is restarted. I can get Event Viewer to connect to it remotely and everything which is also how I know its the same problem "The Desktop Window Manager process has exited Process exit code: 0xe0464645 Primary display device ID "RDPUDD" Chained DD)"

I have not allowed it to apply any updates except for manually reviewed security updates. I also looked into a hardware fault as you mentioned, but I've been running a continuous stress test for days now without issues and as long as I keep the same RDP session open it never drops and black screens out. So I'm going to rule out hardware failure for now

Appreciate any additional help you or someone else can give
What benchmark/s have you been running, specifically?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!