Delay Fencing ipmitool

leonation

New Member
Apr 15, 2015
5
0
1
Hi guys,
I'm using PROXMOX-VE 2.6.32 with a two-node HA-Cluster.
As fencing-device i use ipmitool.

This is my cluster.conf:
?xml version="1.0"?>
<cluster config_version="110" name="cluster">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" cipher="1" ipaddr="xxx.xxx.xxx.239" lanplus="1" login="xxx" name="ipmi1" passwd="xxx"/>
<fencedevice agent="fence_ipmilan" cipher="1" ipaddr="xxx.xxx.xxx249" lanplus="1" login="xxx" name="ipmi2" passwd="xxx"/>
</fencedevices>
<clusternodes>
<clusternode name="ha1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="ha2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
</clusternodes>
</cluster>

My problem is now that i have some network connection.
The connection between the two server can be lost for few seconds.

Is there any way to delay the fencing?
It should be started for example after 30 seconds when the connection is really lost!

Can you post it in my cluster.conf? :)

Best Regards
 
You really should at the very least be running a quorum disk. Imo your network should be stable and what you are asking is nothing more than a band aid instead of the real fix.
 
Hi,
what is the benetfit of a quorum disk?
Can you tell me how i configure it?
It is a extra hardware?
 
Hi,
what is the benetfit of a quorum disk?
Can you tell me how i configure it?
It is a extra hardware?

With what you have right now, if the two nodes loose communication then neither of them can make a proper decision because they are both in the same situation. They would probably end up fencing each other. If you have a quorum disk and the 2 nodes loose communication but 1 node still has communication to the quorum disk then a proper decision can be made and the node without quorum communication can be fenced.

A better example would be, you have 2 nodes, on one of the nodes the OS crashes. Without quorum the remaining node wouldn't be able to fence the crashed node because it only has 1 vote. If you had a quorum disk configured, the remaining node would still have contact with the quorum disk and be able to make a decision.

Once you get into a cluster that has atleast 3 nodes, quorum disks are no longer needed. Essentially a cluster need 3 votes to function properly.

The quorum device can be a iscsi disk presented over the network.

https://access.redhat.com/documenta...dministration/s1-qdisk-considerations-CA.html

This should get you going in the right direction.

https://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster

Good luck!
 
Last edited:
Hi,
many thanks!!!!
Now I have a quorum disk successfully added to my cluster.

clustat on both servers:
Member Name ID Status
------ ---- ---- ------
ha1 1 Online, rgmanager
ha2 2 Online, Local, rgmanager
/dev/block/8:33 0 Online, Quorum Disk


cluster.conf:
<?xml version="1.0"?>
<cluster config_version="136" name="cluster">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<quorumd votes="1" allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10"/>
<totem token="54000"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" cipher="1" ipaddr="x.x.x.x" lanplus="1" login="x.x.x.x" name="ipmi1" passwd="x.x.x.x"/>
<fencedevice agent="fence_ipmilan" cipher="1" ipaddr="x.x.x.x" lanplus="1" login="x.x.x.x" name="ipmi2" passwd="x.x.x.x"/>
</fencedevices>
<clusternodes>
<clusternode name="ha1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="ha2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="118"/>
</rm>
</cluster>

In the clustat output you will see that all node are online.

The Webinterface shows one of the both server "red" and a migration can not be done.

How can I fix it?


Best regards...
 
Hi,
many thanks!!!!
Now I have a quorum disk successfully added to my cluster.

clustat on both servers:



cluster.conf:


In the clustat output you will see that all node are online.

The Webinterface shows one of the both server "red" and a migration can not be done.

How can I fix it?


Best regards...

A reboot usually fixes that up. If you go to the services tab in the proxmox gui, are they all running?
 
Hi,
all Services on all nodes are running.
I restarted HA2 (there are no running VM's) but the issues is the problem.
I can't restart HA1 (here are actually running all VM's) because they are working with the VM's actually and I can't migrate the VM's to HA2 :(

Any other ideas?

Best regards...
 
Hi,
all Services on all nodes are running.
I restarted HA2 (there are no running VM's) but the issues is the problem.
I can't restart HA1 (here are actually running all VM's) because they are working with the VM's actually and I can't migrate the VM's to HA2 :(

Any other ideas?

Best regards...

Sounds like they will be fine as is. I would wait until no one is using them and then give it a try! I've ran into the issue quite a few times and reboot almost always fix's it.

You could try the following but I am not 100% it won't cause issues with your currently running VM's.

/etc/init.d/pve-cluster restart
/etc/init.d/pvestatd restart
 
What you are experiencing is typically caused by enabling HA on an already running VM. To be on the safe side you should only enable a VM to HA when it is stopped. This is also an important test whether your HA for the specific VM works because as soon as you enable HA for a stopped VM the HA manager should automatically start the stopped VM.
 
What you are experiencing is typically caused by enabling HA on an already running VM. To be on the safe side you should only enable a VM to HA when it is stopped. This is also an important test whether your HA for the specific VM works because as soon as you enable HA for a stopped VM the HA manager should automatically start the stopped VM.

That is news to me. I see this happen on clusters which don't have HA enabled on an already running VM. GUI shows one node as red, but clustat reports them as online.

However I have seen the migration failures due to HA being enabled on an allready running VM.