1. Fencing means isolating a node that maybe is working on its own outside the cluster control. You want this to prevent a case that such a node is writing to the
shared storage and probably overlapping with a node that is entitled ( by the cluster ) to write in that area. So when a node is outside cluster control for
whatever reason you trigger a fencing scenario with the help of a device that can take actions like power cycle on the box and hopefully it comes back
online properly to the cluster system
2. The fact that VMs get moved (actually restarted ) to the other nodes on reboot it is called HA ( fencing is issued as a trigger because of your config and it's mandatory for the
reasons explained at 1. , but in your case it doesn't actually do anything ) and this is handled by rgmanager, because you have a <rm> section in your cluster.conf
which contains the proper settings for the VMs. You can check what's going on in /var/log/cluster/rgmanager.log
3. Of course you have to take care of the network connectivity issue you are facing before trying ipmitool to get fencing working:
Why do you bridge eth0 and eth1? Since you have STP off on the bridge, is it maybe a possibility that you are creating a loop with eth0, eth1 and an
external switch? Better explain to what eth0 is connected to, and to what eth1 is? And what exactly do you want to use them for? Also you mentioned
that you chose BMC to use all the NICs in a shared mode with the server so this one doesn't sound right with the fact that you are bridging all the interfaces, better
straighten your network setup first, it might just solve the reachability issue with the BMC.
I have two lan switch... eth0 is connected to sw1 and eth1 is connected too sw2 - STP is configured on both switches to kill loops.
The reason eth 2 and 3 is not connected is because i ran out of cables but i will remove them on all servers to see if that helps.
The only way i can get IPMI/BMC up and running is by using shared NICS. I have a dedicated option but HP told me this was only to be used if i have a dedicated port for ILO and such which this server doesn't.
I don't think im creating loops as long as STP is set on both switches right?
my idea was to bundle all the 4 physical nics on each server with two nics connected to each switch.
I get your suspicion on looping and this being a network problem but i just don't see it since
1. we have had connection to the BMC controller before. The fact that we can't know came out of the blue
2. i DO have connectivity to the BMC controller by that i mean it IS responding to ping AND i can enter the web interface during bootup untill I get to where proxmox starts CMAN