Host crashing when GPU passed to VM is under load

WolfieDaFloof

New Member
Aug 1, 2023
2
0
1
I'm not sure how much help anyone is going to be but I've come across a very annoying issue I've managed to put down to pci passthrough, I've set up a Windows 10 VM and successfully passed through an Nvidia GTX 980 and have all of the drivers working and I access the system using Parsec, I've noticed at times that the VM will become unreachable and I receive notifications through CheckMK from the other 2 hosts in the cluster that the Host the VM is running on is no longer online, after doing some testing I've found if the VM is idling the host will be stable. However, when I put a load on the GPU such as running a game or encoding media on the GPU the Host will lock up between 5 minutes and 5 hours after I first put the GPU under load, checking the console I'm getting errors saying "NMI watchdog detected hard LOCKUP on cpu8" and "watchdog: BUG: soft lockup - CPU#16 stuck for 100s! [kworker/u40:0:870237]"

I've also attached an image of the console during a lockup showing these errors, I'm a little unsure what I can do to solve this so Hoping someone can help.
 

Attachments

  • console at lockup.jpg
    console at lockup.jpg
    727.9 KB · Views: 5

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!