Hi PROXMOX community,
this happened yesterday on two different nodes in the same cluster within an hour.
We had a maintance on a set of our nodes.
We maintanced four nodes (equal in hardware); on two of the four i had the same problem; described below.
While restarting the VMs the current node suddenly vanished from the cluster.
The node came back and we received an IPMI error regarding a bus error on one of the node's GPUs.
The node crashed but, fortunally, rebooted without any further issues.
I suspect rebooting the VM with the GPU attached somehow crashed the node.
This happened on node A and after i was done with it, again on node B about an hour later.
I collected the logs/kernelmessages but found nothing suspicious.
Most of the logs just had a rough cut at the timepoint of the crash.
Anybody had comparable issues while rebooting VMs with PCIe passthrough?
Alex
this happened yesterday on two different nodes in the same cluster within an hour.
We had a maintance on a set of our nodes.
We maintanced four nodes (equal in hardware); on two of the four i had the same problem; described below.
While restarting the VMs the current node suddenly vanished from the cluster.
The node came back and we received an IPMI error regarding a bus error on one of the node's GPUs.
The node crashed but, fortunally, rebooted without any further issues.
I suspect rebooting the VM with the GPU attached somehow crashed the node.
This happened on node A and after i was done with it, again on node B about an hour later.
I collected the logs/kernelmessages but found nothing suspicious.
Most of the logs just had a rough cut at the timepoint of the crash.
Anybody had comparable issues while rebooting VMs with PCIe passthrough?
Alex