PROXMOX node crash on VM reboot

AWoelfel

Member
Feb 17, 2021
3
0
6
41
Hi PROXMOX community,

this happened yesterday on two different nodes in the same cluster within an hour.
We had a maintance on a set of our nodes.
We maintanced four nodes (equal in hardware); on two of the four i had the same problem; described below.


While restarting the VMs the current node suddenly vanished from the cluster.
The node came back and we received an IPMI error regarding a bus error on one of the node's GPUs.
The node crashed but, fortunally, rebooted without any further issues.
I suspect rebooting the VM with the GPU attached somehow crashed the node.


This happened on node A and after i was done with it, again on node B about an hour later.

I collected the logs/kernelmessages but found nothing suspicious.
Most of the logs just had a rough cut at the timepoint of the crash.


Anybody had comparable issues while rebooting VMs with PCIe passthrough?


Alex
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!