HA error behaviour

janzun

New Member
Mar 19, 2011
28
0
1
Spain
Im testing HA with this scenario:

- 3 x PVE 2.1 fresh install, last full-upgrade
- Fencing with ILO/IPMI working ok, with dedicated interface
- 2 x KVM Ubuntu Server 12.04 VMs

First I've configured HA in the ubuntu machines and HA puts both in same server. Second I isolated the server by putting off their ethernet cables (bonding). Cluster detect the node off (Aug 01 15:10:31 rgmanager State change: virtualnc02 DOWN), but nothing happend with VMs, they dont start in another node. Last I put on the ethernet cables again and VMs appeared shutdown in web interface, and i've needed to remove from HA to power up again.

In all moment the rgmanager shows:

Aug 01 15:11:54 rgmanager [pvevm] VM 100 is running
Aug 01 15:11:54 rgmanager [pvevm] VM 101 is running

but its false.

And clustat says HA vms (100 and 101) are in node 'virtualnc03', but the are rellay in 'virtualnc02', which was the isolated one:

root@virtualnc02:~# clustat
Cluster Status for VIRTUALNC @ Wed Aug 1 15:48:23 2012
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
virtualnc03 1 Online, rgmanager
virtualnc04 2 Online, rgmanager
virtualnc02 4 Online, Local, rgmanager

Service Name Owner (Last) State
------- ---- ----- ------ -----
pvevm:100 virtualnc03 started
pvevm:101 virtualnc03 started
root@virtualnc02:~# ssh virtualnc03 qm list
root@virtualnc02:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
100 prueba stopped 512 15.00 0
101 prueba2 stopped 512 15.00 0


So, its a erratic behaviour in HA for me. Any idea?
 
Last edited:
What is the output of

# fence_tool ls

Most likely your fencing device is not working, because it does not have power.
 
Hi Dietmar,

root@virtualnc03:~# fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 4

Fence is working Ok, there is a dedicated ilo interface in isolated server, and I can see how it is rebooted. I can`t start the affected machines in HA anymore, all goes ok when I remove them from HA, but when i put them again, clustats says their owner is the wrong one.

Edit: I've rebooted all nodes and know all work ok. Dont know why :/
 
Hi janzun

Can u Tell me a stepByStep how do u install the ILO?
because i've got also a HA-Cluster with proxmox and ILO but i've got all the time error (RIPCL)

What I've done:

-install Proxmox on 3 Server
-Make a Cluster and join with all 3 Server
-Fence_tool join with all 3 Nodes
-config the cluster.conf like this
cluster-conf21.jpg
Did i forget something?

because my error is this one:
fence.jpg

thx for your help.
Greetz Yves
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!