[SOLVED] Ceph Network Failure --> VM Frozen


Mar 7, 2021
I have a 3 node test cluster PVE (6.4.4), identical Nodes, 4 Networks, setup as mesh network with broadcast.
WebAccess Network (192.168.178.x) Card 1 1GBit/s
PVE HA Network (192.168.168.x) Card 1 1GBit/s
Ceph Public Network (192.158.158.x) Card 2 10GBit/s
Ceph BackEnd Network (192.168.148.x) Card 2 10GBit/s
under normal operation everything is working fine HA-VM's, automatic failover, migration etc...

I try to simulate a network failure on the Ceph Networks, since all 4 NI's are on one card...

So I started up a VM in HA mode with the OS disk located in Ceph-Storage, on Node 1. After successful VM start I removed all ethernet cables related to the Ceph Network (HA Link still active).

Expectation: Ceph healt Warning and VM restarts on an different node after 2 - 3 minutes.

What Happens: Ceph healt Warning is there, but VM is stuck unresponsive on Node 1.

Is there a way to enable an auto-migrate in this Case ?
there is no such detection built in at the moment. As long as PVE's cluster communication works, the node won't get fenced.
  • Like
Reactions: Tux_1024


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!