[SOLVED] Ceph Network Failure --> VM Frozen

Tux_1024

Member
Mar 7, 2021
21
24
23
33
Germany
I have a 3 node test cluster PVE (6.4.4), identical Nodes, 4 Networks, setup as mesh network with broadcast.
WebAccess Network (192.168.178.x) Card 1 1GBit/s
PVE HA Network (192.168.168.x) Card 1 1GBit/s
Ceph Public Network (192.158.158.x) Card 2 10GBit/s
Ceph BackEnd Network (192.168.148.x) Card 2 10GBit/s
under normal operation everything is working fine HA-VM's, automatic failover, migration etc...

I try to simulate a network failure on the Ceph Networks, since all 4 NI's are on one card...

So I started up a VM in HA mode with the OS disk located in Ceph-Storage, on Node 1. After successful VM start I removed all ethernet cables related to the Ceph Network (HA Link still active).

Expectation: Ceph healt Warning and VM restarts on an different node after 2 - 3 minutes.

What Happens: Ceph healt Warning is there, but VM is stuck unresponsive on Node 1.

Is there a way to enable an auto-migrate in this Case ?
 
Hi,
there is no such detection built in at the moment. As long as PVE's cluster communication works, the node won't get fenced.
 
  • Like
Reactions: Tux_1024