[SOLVED] HUGE Fencing problem with IPMI

1. Fencing means isolating a node that maybe is working on its own outside the cluster control. You want this to prevent a case that such a node is writing to the
shared storage and probably overlapping with a node that is entitled ( by the cluster ) to write in that area. So when a node is outside cluster control for
whatever reason you trigger a fencing scenario with the help of a device that can take actions like power cycle on the box and hopefully it comes back
online properly to the cluster system
2. The fact that VMs get moved (actually restarted ) to the other nodes on reboot it is called HA ( fencing is issued as a trigger because of your config and it's mandatory for the
reasons explained at 1. , but in your case it doesn't actually do anything ) and this is handled by rgmanager, because you have a <rm> section in your cluster.conf
which contains the proper settings for the VMs. You can check what's going on in /var/log/cluster/rgmanager.log
3. Of course you have to take care of the network connectivity issue you are facing before trying ipmitool to get fencing working:
Why do you bridge eth0 and eth1? Since you have STP off on the bridge, is it maybe a possibility that you are creating a loop with eth0, eth1 and an
external switch? Better explain to what eth0 is connected to, and to what eth1 is? And what exactly do you want to use them for? Also you mentioned
that you chose BMC to use all the NICs in a shared mode with the server so this one doesn't sound right with the fact that you are bridging all the interfaces, better
straighten your network setup first, it might just solve the reachability issue with the BMC.

I have two lan switch... eth0 is connected to sw1 and eth1 is connected too sw2 - STP is configured on both switches to kill loops.
The reason eth 2 and 3 is not connected is because i ran out of cables :) but i will remove them on all servers to see if that helps.

The only way i can get IPMI/BMC up and running is by using shared NICS. I have a dedicated option but HP told me this was only to be used if i have a dedicated port for ILO and such which this server doesn't.

I don't think im creating loops as long as STP is set on both switches right?

my idea was to bundle all the 4 physical nics on each server with two nics connected to each switch.

I get your suspicion on looping and this being a network problem but i just don't see it since
1. we have had connection to the BMC controller before. The fact that we can't know came out of the blue
2. i DO have connectivity to the BMC controller by that i mean it IS responding to ping AND i can enter the web interface during bootup untill I get to where proxmox starts CMAN
 
OH MY F... GOD!!!!!

Thheo you deserve a medal!!!

so i went and removed eth 2 and 3 from all the proxmox interfaces.. which did nothing after reboot.. so i removed eth1 so it was back to default eth1 and it WORKED! now i can ping both my bmc controller AND lan interface...

talking about not seeing the forrest for the trees.. or something :) but i don't understand why it didn't work when i added the other interfaces?

EDIT:

And fencing is working

Code:
root@proxmox00:~# fence_node proxmox02 -vv
fence proxmox02 dev 0.0 agent fence_ipmilan result: success
agent args: nodename=proxmox02 agent=fence_ipmilan ipaddr=10.10.99.32 lanplus=1 auth=password login=admin passwd=XXXXXXX power_wait=5 method=cycle
fence proxmox02 success
root@proxmox00:~#
 
Last edited:
I suspect that the way of BMC port sharing on all interfaces works ( it has to do some internal bridging ) together with bridging interfaces via the host causes a loop. You could use a different vlan for the BMC
( i guess it's configurable ) and you can bring up an eth0.vlan with a different subnet just to have control for BMC. But still I would give up on the all-port sharing for BMC.. use only one and even with a vlan.
To me it doesn't make sense to simply bridge the host interfaces, but I don't know your full plan to have the full picture. Glad that you have this part sorted out.. you can move to the remaining problems :)
 
About the port sharing.
According to HP support themselves dedicated is ONLY if I have a port for it in my server which i don't so i have to use nic sharing :(
I seem to remember to be able in bios to assign a vlan for BMC but i will need to check that when i get back to the datacenter.

So the way i have setup my network wasn't wrong i assume?

I haven't gotten to the excat networking setup yet. My plan was to just bondle all the interfaces over LACP and do some brutal testing of the redundancy by pulling power cords to different parts of the infrastructor and other stuff like lan cables...

I will at some point hopefully host client system om this proxmox solution. I would segment these clients into VLANS.. the way I was advised to do it was to bond all the interfaces and than make a individual bridge for each client with a vlan tag and use that interface for the clients VMs... now i don't remember if its the other way around bridge with multiple bonds or bonds with multiple bridges but that was my plan.

but knowing why i couldn't ping my bmc anymore is a HUGE step to get this up and running.
 
Maybe I misread this: "You also have to choose shared nics which means you can reach IPMI web interface on all nics in the server"
Did you choose one NIC or all the NICs for IPMI? I have no idea about iLo.. so maybe I am asking something stupid and choosing all NICs it's not possible, but I would recommend to stick with only
one shared and preferably with a dedicated VLAN. For my iDRAC I also use a shared NIC..
For interface bonding you should have a plan about what will you transfer over those and maybe separate things: like NFS where you need higher troughput
over dedicated links and normal traffic on others. It is "best practice" to separate these properly since you could have for example a DDoS targeted to
the VMs and it would cause congestion on the links between the host and the storage.
I would say it didn't make sense to have eth0 and eth1 bridged.. unless you make an ethernet ring and you connect these interfaces in 2 different ethernet switches
and all the elements inside this ring would run STP ( including the bridging instance ). You should consider a scenario where you have node and path redundancy:
that means that all the elements and the paths between them can fail and you still have connectivity.
 
Hi Thheo,

in the BIOS for IPMI i can choose to either go with dedicated NIC or shared NIC.. its a simple menu dedicated or shared.. what HP told me was taht dedicated won't work unless I have a ILO port pressent which i doesn't so i can't use that.. I therefore have to use shared nics.. which means I can access my BMC interface from all the NICS in the server no matter how they are configured each networkwise..

My setup is as follows...

I have two WAN connection into my system.

I have two Cyberoam Firewalls in a HA configuration for fail over to each WAN and the LAN

I have two Network switches where port 21 22 23 and 24 are configured in a LACP active trunk - STP active

I was planing on have all 4 nics on my servers in one big LACP group with two nics connected to each switch. Based on what you say about NFS and i share you atitude i would proberly go two nics for VM lan connected with one in each switch and two other nics for the NFS network on the last two nics in the server..

this should mean that i could loose 1 wan connectivity and still be going.. loose one firewall and still be going .. loose one switch and still be going.. loose one node and still be going..

does this make sense?

Thanks

Casper
 
Last edited:
Ok so today i went back to the datacenter..

all my nodes are now set to only use eth0 for network which is how i can maintain my BMC responding and working. the fence_node command works and all looks great so i figured i would try and pull the power cord to see what would happen....

NOTHING... the VMs stayed on the node and no migration was happening. I think i need the external fencing device mir was talking about...
I have looked into this one AP7920 from APC.. its a APC Switched Rack PDU 10A 1U 208/230V which i belive support fencing according to this site https://access.redhat.com/site/articles/28603#15.. am i right? would this be well for a external fencing device?

Also another EUREKA momen for me today.. by accident i expected the nic a little more closely and noticed that nic also had the mgmt text next to it. I'm thinking maybe HP was wrong and that if i set IPMI in the bios to dedicated NIC1 would become my dedicated IPMI/BMC NIC??? I tried it out both with DHCP and static IP set in bios but its not responding... any suggestions?

Thanks all..

Casper
 
FYI:

after a long long talk with HP clearing up misunderstandings and so on it appears that if i choose dedicated in the bios for IPMI I need the mangemenet port expantion thing.. if i go with shared it will use nic1 and it would be shared with what ever nic ip config i would give it.. need to test this but it makes sense... one more problem is dead..
 
You could use an ethernet controlled PDU to achieve fencing, however I am thinking that the only reason for your IPMI not to work is power outage to that node, then
you could setup also a secondary fake fencing device that would return success no matter what so that rgmanager would restart the service on the other nodes.
Do you think you need to protect yourself from a situation where that node has no power? In such a case the node doesn't work so it won't affect the storage, but indeed if I remember
correctly if the fence command is not successful then the cluster will not release the lock for rgmanager.
Also if the node has no power what is the chance that the PDU has power so you can control it? Do you have dual-PS on each server?
Could you also check the logs in /var/log/rgmanager.log and the other logs to see what happens when you turn-off the power?
Also when you manually fence a node and gets brutally restarted, do the VMs start on the other nodes?
 
Last edited:
From product sheet:
"Remote Individual Outlet Control: Remotely manage outlets so users can turn outlets off that are not in use (prevent overloads) or recycle power to locked-up equipment (minimize costly downtime and avoid travel time to equipment)."

So yes, this will do.
 
You could use an ethernet controlled PDU to achieve fencing, however I am thinking that the only reason for your IPMI not to work is power outage to that node, then
you could setup also a secondary fake fencing device that would return success no matter what so that rgmanager would restart the service on the other nodes.
Do you think you need to protect yourself from a situation where that node has no power? In such a case the node doesn't work so it won't affect the storage, but indeed if I remember
correctly if the fence command is not successful then the cluster will not release the lock for rgmanager.
Also if the node has no power what is the chance that the PDU has power so you can control it? Do you have dual-PS on each server?
Could you also check the logs in /var/log/rgmanager.log and the other logs to see what happens when you turn-off the power?
Also when you manually fence a node and gets brutally restarted, do the VMs start on the other nodes?

Right thheo, if you want avoid a single point of failure, will be necessary use two UPSs for two PDUs, and the cluster configuration file must have both PDUs configurated. Red Hat also have this situation documented, please see this link:
https://access.redhat.com/site/docu...ple_-_Fence_Devices/#Dual_Power_Configuration

Or this "Multiple Fence Methods per Node" (more easy for understand and apply to PVE):
https://access.redhat.com/site/docu..._Administration/s1-config-fencing-cli-CA.html

Or this: "Configuring a Backup Fence Device" (the more easy for understand and apply to PVE):
https://access.redhat.com/site/docu...nistration/s2-backup-fence-config-ccs-CA.html

Also you must think about of what happend if a Hardware component fail on the mainboard, as slots PCIe, the RAID Card, any board, etc, these problems aren't problems of power failure and in some cases may be necessary to make manual fence with human intervention, so i think that for best practices is always necessary to have manual_fence enabled as third option always that the two previous options must be the APC PDUs.

Best regard
Cesar
 
Last edited:
If you use such device (UPS) you can also use it for fencing.

Hi Dietmar, a pleasure to greet you again.

Please let me to a question:

In this link i see that Red Hat don't tell us about of a UPS as Fence device:
https://access.redhat.com/site/articles/28603

Then, why do you say that a UPS may be used as fence device?
And please, if it can used as fence device, give me a link about of as configure it in the cluster configuration file.

Best regards
Cesar
 
First off.. thanks for the huge help so fare from all of you!

im confused though... do i or do i NOT need a external fence device in order to have VM migrate or boot up on another node if another node lost power, mobo failure or other internal server failure that kills the server? Or can it be done with fencing on the servers themselves?

I really would like a solution where i woulnd't have to invest in more gear like the pdu or ups .. or what ever :) therefore what thheo suggest sounds interesting..

secondary fake fencing device
how would you achive this? and how solid would such a solution be?

thheo wrote:
Do you think you need to protect yourself from a situation where that node has no power? In such a case the node doesn't work so it won't affect the storage, but indeed if I remember
correctly if the fence command is not successful then the cluster will not release the lock for rgmanager.
Also if the node has no power what is the chance that the PDU has power so you can control it? Do you have dual-PS on each server?
That is my worry excatly.. my servers only have one psu! i got 3 so thats ok IF i can get them VMs to start on another node in case of a power failure.. my servers are in a hosted rack in a datacenter so i really don't think i will loose power at anypoint which could justify to buy a pdu.. I need a way to compensate for the fact that my servers only have one PSU...
I know that the affected server with the PSU failure won't affect storage but those VMs would be unavalible untill power is restored and i can't live with that.. i need them to start up on another node..

Right thheo, if you want avoid a single point of failure, will be necessary use two UPSs for two PDUs, and the cluster configuration file must have both PDUs configurated. Red Hat also have this situation documented, please see this link:
https://access.redhat.com/site/docum..._Configuration

Or this "Multiple Fence Methods per Node" (more easy for understand and apply to PVE):
https://access.redhat.com/site/docum...ng-cli-CA.html

Or this: "Configuring a Backup Fence Device" (the more easy for understand and apply to PVE):
https://access.redhat.com/site/docum...ig-ccs-CA.html

Also you must think about of what happend if a Hardware component fail on the mainboard, as slots PCIe, the RAID Card, any board, etc, these problems aren't problems of power failure and in some cases may be necessary to make manual fence with human intervention, so i think that for best practices is always necessary to have manual_fence enabled as third option always that the two previous options must be the APC PDUs.

Best regard
Cesar

Thanks for the input Cesar i will study those sites when i get the time

@Mir

Thanks now i have a backup plan incase i can't get this working without more gear!

Thanks for your inputs once more!

Casper
 
@offerlam:

I have in a small business this scenery:
Only two PVE Nodes in HA for the VMs and without fence devices.
Always, i can do manually On-Line migration of the VMs without problems (always that the two nodes are working perfectly).

For get HA, i use only manual_fence, ie if a PVE Node show a erratic behavior, i will disconnect manually the power AC of this PVE Node, after, by CLI in the other PVE Node i will run the manual fence, then, the VMs of the old PVE Node will starts in this node that is alive

I have this scenery since many years ago in a production environment and don't have problems, but you can have the backup fence: the "ILO" and the "Manual" for get more security, and obviously the "ILO fence" must be your first option in both PVE Nodes.

Best regards
Cesar

Comment: This post has been re-edited
 
Last edited:
The APC 'UPS Network Management Card" is listed as AP9606. But AFAIC that produced is dicontinued, an now called AP9630 ...

Many thanks Dietmar, i will study this option :p. When?, I do not know.

Best regards
Cesar
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!