I setup a new cluster with ceph and have separate networks for proxmox managment, ceph and VM public access. I was testing failover and fencing and saw something that didn't look correct to me so wanted to check. I am using the default softdog and after disconnecting the proxmox management network softdog looked to do it's job and rebooted the node and I was able to watch the VM on that node fail to another node and even received email alert that the node have been fenced. The issue is looking ceph I could see the OSD's from that node come back online and while in that node through IDRAC I ran the following command and it looks to me that the VM was running again on that node. Which if the ceph OSD's reconnect seems to be the exact scenario you are trying avoid with the fencing process.

When I saw this, I immediately restored it's management network for it to restore quorum and ran it again and it no longer showed to be running on that node. So my question is should I expect to see this until the management network is restored and is that okay or did, I miss a setting somewhere?

When I saw this, I immediately restored it's management network for it to restore quorum and ran it again and it no longer showed to be running on that node. So my question is should I expect to see this until the management network is restored and is that okay or did, I miss a setting somewhere?