False Fencing Issue

Andrew Holybee · May 16, 2017

I came from Vmware and have been happy with Proxmox the one dark spot is it seems like my cluster is doing false fences. Where it just takes down a node randomly and fences the VM. Also if I take down one node manually it seems to restart all of my nodes which doesn't make any sense. I check the logs on the node but it always resets to after the restart how can I look at logs leading up to fence?

fabian · May 17, 2017

Andrew Holybee said:
I came from Vmware and have been happy with Proxmox the one dark spot is it seems like my cluster is doing false fences. Where it just takes down a node randomly and fences the VM. Also if I take down one node manually it seems to restart all of my nodes which doesn't make any sense. I check the logs on the node but it always resets to after the restart how can I look at logs leading up to fence?

fencing usually happens because of a loss of quorum, which is often caused by unreliable multicast in the cluster network. see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network for details

you can enable persistent journal logging with "mkdir /var/log/journal; systemctl restart systemd-journald". after that, you should be able to list boot intervals using "journalctl --list-boots".

Andrew Holybee · May 17, 2017

We had issues with everything being on the same power connect switch so we moved ceph storage to a dedicated switch a netgear jgs524ev2-200nas. I followed the white paper netgear has so I disabled igmp snooping status and block unknown multicast address to disable and set broadcast forwarding method to hardware. The fencing is seemingly random so I am hoping logs will tell me whats going on, will the persistent logging affect performance?

fabian · May 18, 2017

Andrew Holybee said:
... on, will the persistent logging affect performance?

no, unless you have abysmal storage for /var/log and a crazy amount of logging.

Andrew Holybee · May 26, 2017

I am going to be doing omping I just want to make sure I get the syntax right
so I click on node 1 then shell
then run
omping -c 10000 -i 0.001 -F -q 10.10.10.1 10.10.10.2 ... or do I need to have the name of the node in there as well as in?
omping -c 10000 -i 0.001 -F -q PM01-10.10.10.1 pm02-IP 10.10.10.2 etc?

this is for ceph do i need to run omping for both normal lan and ceph?

manu · Jun 6, 2017

you have to do omping on the cluster network
( ie ping the nodes ip as they appear in /etc/pve/corosync.conf after each "name" entry

Search

Search

False Fencing Issue

Andrew Holybee

Well-Known Member

fabian

Proxmox Staff Member

Andrew Holybee

Well-Known Member

fabian

Proxmox Staff Member

Andrew Holybee

Well-Known Member

manu

Proxmox Staff Member