VM Crash with passed hardware - No Log found

ChAoS

Member
Apr 29, 2021
31
4
8
41
Hello Forum,

having the following Problems and hope someone can help.

Using my old Core i7 3770 on a Gigabyte Q77m Board since years for Proxmox.
Some Hardware was passed to a vm without any problem.
- Raid controller,
- DVB Satellite capturing card
- Gig Network card
all are PCI-e Cards
With Proxmox 6.x and a Windows Server 2008r2 guest all ran fine, months of uptime without a crash.

Now I re-arranged all:
Upgraded Proxmox to 7.x (current Version tested)
Removed the passed gig ethernet card completely.

Installed a new Windows 2022 Server in 180d trial to test and passed the two other cards (Raid and Satellite) into this VM and Installed a 2nd W10 VM with a "new" passed Geforce GT1050 with 2g of RAM.
Problems starting now:

The Server VM suddenly rebooted while intensively using the hardware RAID controller. Good, once is nonce i thought.

More massively I have the problems while using the w10 VM. When I watch video or using it, it crashes sometimes 3x within a 90m video, sometimes not even 1 time while watching 2 hrs video.

It goes black, reboots and is back after 15 or 20 seconds.

Windows Eventlog tells nothing.
dmesg tells nothing.

Same happened two times after that while using the dvb controller in the server.

I did many things to figure out. Interchanged cards between slots, using and not using risers, degrading the geforce to x1 with a piece of paper and and and

I also set the option: echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf which prevents vm from crashing

also blacklisted all hardware ids passed through and passed the driver to vfio.

As you can see on an older post, I tested to downgrade the kernel to 5.15.5-1 - obviously the io delay went down but w10 vm crashed within 40 minutes 2 times.


Pls, if anyone has ideas - submit
 
Check your memory, as I only know about passthrough crashes immediately instead over some time. This indicates that it is stress/temperature/power/load related and not (only) due to passthrough issues. Also check if your power supply is capable of handling all devices at full load at the same time. Maybe it it overloaded now that more devices are being used by the VMs at the same time,
 
Windows Eventlog tells nothing.
dmesg tells nothing.
Smells like hardware error. As suggested, check the memory. Try to blow out dust, reseat it. Try another power supply to rule that out. The older they get, the more likely they will blow up or give strange effects with crashes.
Reseat/double check all cables, especially SATA can be wobbly.
Also check the capacitors: https://www.pcstats.com/articles/195/index.html
If you found nothing so far, you could try the reset jumper on the mobo. Sometimes it's just that.

Was your upgrade from an older version or a fresh install of 7.x?
If nothing helps, try a fresh install and run only the minimum config (without any cards) over a time frame (maybe 24h or so?) and add then one after another...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!