Problem with HA

nsscorp

New Member
Jan 10, 2013
6
0
1
We have installed a HA cluster with 2 HP servers using iLO interfaces for fencing. The system is full updated and it's running version 2.2-31.


From time to time the first node is loosing the communication with the second node (I guess so). After that the cluster appears as broken and we cannot even change7 the setting of the VMs (we have Device or resource busy message).


The prox1 logs:


rgmanager.log
Jan 09 16:41:34 rgmanager State change: prox02 DOWN


fenced.log
Jan 09 16:41:34 fenced fencing node prox02
Jan 09 16:41:36 fenced fence prox02 success


pvecm nodes
Node Sts Inc Joined Name
1 M 692 2012-12-18 12:15:57 prox01
2 X 716 prox02




The prox2 logs:


rgmanager.log
Jan 09 16:41:37 rgmanager #67: Shutting down uncleanly


fenced.log
Jan 09 16:41:37 fenced cluster is down, exiting


pvecm nodes
cman_tool: Cannot open connection to cman, is it running ?


We also tried to restart cman to the second node and we have the following messages:
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Starting GFS2 Control Daemon: gfs_controld.
Unfencing self... fence_node: cannot connect to cman
[FAILED]


and if we check the cman status it displays:


root@prox02:/var/run# /etc/init.d/cman status
Found stale pid file

Also if we check the fencing from prox01 it works always without a problem but from prox02:

fence_node: cannot connect to cman

If we reboot the two nodes the HA is working again without any problem until it broke again.



Is there any way to recover the cluster without rebooting the two nodes ?
How can we solve the cluster problem ?
 
Hi Tom.

That's why we are using iLo for fencing. Here is my cluster.conf (the passwords are removed for security reasons).

<?xml version="1.0"?>
<cluster config_version="27" name="hfcloud">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ilo" ipaddr="10.10.50.200" login="hpcluster" name="fenceNodeA" passwd="REMOVED"/>
<fencedevice agent="fence_ilo" ipaddr="10.10.50.201" login="hpcluster" name="fenceNodeB" passwd="REMOVED"/>
</fencedevices>
<clusternodes>
<clusternode name="prox01" nodeid="1" votes="1">
<fence>
<method name="1">
<device action="status" name="fenceNodeA"/>
</method>
</fence>
</clusternode>
<clusternode name="prox02" nodeid="2" votes="1">
<fence>
<method name="1">
<device action="status" name="fenceNodeB"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm/>
<totem window_size="100"/>
 
From time to time the first node is loosing the communication with the second node (I guess so). After that the cluster appears as broken and we cannot even change7 the setting of the VMs (we have Device or resource busy message).

Node prox01 will fence prox02 if communication is broken (if fact both will fence each other). Are you sure fencing works? In your case, a reboot of prox02 should clear the issue?
 
I know that if we reboot the prox02 node the issue will cleared. Is there any way to prevent rebooting (the system is productive) ?

In this case fencing works.


Regards,

Antonis
 
I know that if we reboot the prox02 node the issue will cleared. Is there any way to prevent rebooting (the system is productive) ?

This is just my opinion, but what you do here is extremely dangerous. You run with two-node=1 and use unreliable fencing. I would never ever do that or suggest someone to use such setup.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!