PROXMOX node crash on VM reboot

AWoelfel

Member
Feb 17, 2021
3
0
6
42
Hi PROXMOX community,

this happened yesterday on two different nodes in the same cluster within an hour.
We had a maintance on a set of our nodes.
We maintanced four nodes (equal in hardware); on two of the four i had the same problem; described below.


While restarting the VMs the current node suddenly vanished from the cluster.
The node came back and we received an IPMI error regarding a bus error on one of the node's GPUs.
The node crashed but, fortunally, rebooted without any further issues.
I suspect rebooting the VM with the GPU attached somehow crashed the node.


This happened on node A and after i was done with it, again on node B about an hour later.

I collected the logs/kernelmessages but found nothing suspicious.
Most of the logs just had a rough cut at the timepoint of the crash.


Anybody had comparable issues while rebooting VMs with PCIe passthrough?


Alex