Proxmox 3.4 cluster HA Vms don't migrate if node fails

coppola_f

Renowned Member
Apr 2, 2012
64
8
73
Italy
guys,

i've installed a two node + quorum disk cluster following wiki articles
all is built on two hp proliand dl360 gen 9 servers + shared hp msa 2040 sas 6g storage box with multipath
quorum is provided by an iscsi target placed on a qnap nas...

all seems working fine,
tests operated with common tools runs fine

vms live migrate on nodes,
if we stop service rgmanager the vms migrates,
if we shutdown or reboot a node vms migrates,
if we fence the node all fine again, the vm migrates...

now,
i've done another test, we've roughly unplugged bot power cables from server's psu....
the vms are not migrating to the survivor node..
waiting some mins without success...
when i plug ac power cords again, after a few seconds (i may suppose when ilo4 logic boot is complete) the vms starts on the survivor node....

well,
this seems to be a failure in HA structure,
sure this is due to my mistake....

i'm available to give you all informations you may request to hel me solve the issue

waiting your requests or suggestions!

regards,

Francesco
 
Do you use ILO as fencing device? If so, that is the issue on PVE 3.x, because if you unplug your complete server from power the other node can't reach your fencing device and thus can't fence it (and therefor don't migrate the VM). You can add a second fencing device outside your server (APC device?) or upgrade to PVE 4.x, since PVE 4.x is working with (hardware) watchdog timer.
 
yes we're using ilo,
it's the only fencing device we've on this configuration!

may you suggest me a known working apc device model be used for this config?!?
thanks

Francesco
 
Any APC that can switch power off/on will work. We've had it running with much different models when we were using PVE 3.x
Currently I only have 1 PVE 3.4 cluster left (all the others are on PVE 4.x now) and that cluster is using APC AP8653's with firmware 6.3.3.
 
ok,

many thanks,
i'll evaluate with customer...
if they've an old server with ability to run proxmox we'll upgrade to v.4x and leave the third (older) node as "minor node" just to achieve HA and quorum requirements....
if this option will not be available we'll install an apc managed power bar to gain a second fencing device....

regards,
Francesco
 
You can add a second fencing device outside your server (APC device?)
Also we have other solution for this kind of problem:
In the list of fence devices, we have a that his name is "fence_ack_manual", that only works when the server is power off, and it require a manual intervention from any PVE node that is alive, ie, for use this kind of fence, it not required some kind of device.

The real goal of this fence was for clusters created over Internet, where a node can't get access to other node with the techniques of tradicional fences, but in your case, and as a second option of fence, it can be used always that first you disconnect totally the power cord of server with problems.

Also we should think: What happens with our VMs if a fence device surprisingly it decomposes?. .... or we have a second fence device connected to the first (and correctly configured in our PVE cluster), or we use a manual fence (that also must be correctly configured in our PVE cluster).

Best regards
Cesar
 
guys,

as you suggested...
we've added an APC 7902 device as additional fencing...

now all is running fine....

many thanks again for your help!

regards,
francesco
 
  • Like
Reactions: wosp

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!