HA error behaviour

janzun

New Member
Mar 19, 2011
28
0
1
Spain
Im testing HA with this scenario:

- 3 x PVE 2.1 fresh install, last full-upgrade
- Fencing with ILO/IPMI working ok, with dedicated interface
- 2 x KVM Ubuntu Server 12.04 VMs

First I've configured HA in the ubuntu machines and HA puts both in same server. Second I isolated the server by putting off their ethernet cables (bonding). Cluster detect the node off (Aug 01 15:10:31 rgmanager State change: virtualnc02 DOWN), but nothing happend with VMs, they dont start in another node. Last I put on the ethernet cables again and VMs appeared shutdown in web interface, and i've needed to remove from HA to power up again.

In all moment the rgmanager shows:

Aug 01 15:11:54 rgmanager [pvevm] VM 100 is running
Aug 01 15:11:54 rgmanager [pvevm] VM 101 is running

but its false.

And clustat says HA vms (100 and 101) are in node 'virtualnc03', but the are rellay in 'virtualnc02', which was the isolated one:

root@virtualnc02:~# clustat
Cluster Status for VIRTUALNC @ Wed Aug 1 15:48:23 2012
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
virtualnc03 1 Online, rgmanager
virtualnc04 2 Online, rgmanager
virtualnc02 4 Online, Local, rgmanager

Service Name Owner (Last) State
------- ---- ----- ------ -----
pvevm:100 virtualnc03 started
pvevm:101 virtualnc03 started
root@virtualnc02:~# ssh virtualnc03 qm list
root@virtualnc02:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
100 prueba stopped 512 15.00 0
101 prueba2 stopped 512 15.00 0


So, its a erratic behaviour in HA for me. Any idea?
 
Last edited:
What is the output of

# fence_tool ls

Most likely your fencing device is not working, because it does not have power.
 
Hi Dietmar,

root@virtualnc03:~# fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 4

Fence is working Ok, there is a dedicated ilo interface in isolated server, and I can see how it is rebooted. I can`t start the affected machines in HA anymore, all goes ok when I remove them from HA, but when i put them again, clustats says their owner is the wrong one.

Edit: I've rebooted all nodes and know all work ok. Dont know why :/
 
Hi janzun

Can u Tell me a stepByStep how do u install the ILO?
because i've got also a HA-Cluster with proxmox and ILO but i've got all the time error (RIPCL)

What I've done:

-install Proxmox on 3 Server
-Make a Cluster and join with all 3 Server
-Fence_tool join with all 3 Nodes
-config the cluster.conf like this
cluster-conf21.jpg
Did i forget something?

because my error is this one:
fence.jpg

thx for your help.
Greetz Yves