False Fencing Issue

Andrew Holybee

Well-Known Member
Mar 27, 2017
52
0
46
43
I came from Vmware and have been happy with Proxmox the one dark spot is it seems like my cluster is doing false fences. Where it just takes down a node randomly and fences the VM. Also if I take down one node manually it seems to restart all of my nodes which doesn't make any sense. I check the logs on the node but it always resets to after the restart how can I look at logs leading up to fence?
 
I came from Vmware and have been happy with Proxmox the one dark spot is it seems like my cluster is doing false fences. Where it just takes down a node randomly and fences the VM. Also if I take down one node manually it seems to restart all of my nodes which doesn't make any sense. I check the logs on the node but it always resets to after the restart how can I look at logs leading up to fence?

fencing usually happens because of a loss of quorum, which is often caused by unreliable multicast in the cluster network. see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network for details

you can enable persistent journal logging with "mkdir /var/log/journal; systemctl restart systemd-journald". after that, you should be able to list boot intervals using "journalctl --list-boots".
 
We had issues with everything being on the same power connect switch so we moved ceph storage to a dedicated switch a netgear jgs524ev2-200nas. I followed the white paper netgear has so I disabled igmp snooping status and block unknown multicast address to disable and set broadcast forwarding method to hardware. The fencing is seemingly random so I am hoping logs will tell me whats going on, will the persistent logging affect performance?
 
I am going to be doing omping I just want to make sure I get the syntax right
so I click on node 1 then shell
then run
omping -c 10000 -i 0.001 -F -q 10.10.10.1 10.10.10.2 ... or do I need to have the name of the node in there as well as in?
omping -c 10000 -i 0.001 -F -q PM01-10.10.10.1 pm02-IP 10.10.10.2 etc?

this is for ceph do i need to run omping for both normal lan and ceph?
 
you have to do omping on the cluster network
( ie ping the nodes ip as they appear in /etc/pve/corosync.conf after each "name" entry
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!