HA not functioning properly (or as advertised)

Drkrieger

New Member
Jul 27, 2018
13
7
3
40
Hello!
He have a Proxmox VE 5.2 cluster consisting of 4 nodes, each running on a Supermicro server. We've created the cluster, and created an HA group, then added VMs to the group. We've then spread the VMs out over the 4 machines, and tested a node failure (pulled the power).

When the node failed, the VMs being hosted simply froze and went completely unresponsive. We were hoping that there would be a live migration to another node and there would be no (or minimal) interruption.

In all of the documentation we've poured over, there's been hints that Fencing needs to be setup on these nodes. We've been unable to find any solid information on how to configure this for Proxmox 5, nor has any of the documentation stated that it is truly required. The servers do have Supermicro's IPMI built in, but we're not sure how to configure them so Proxmox can take advantage of it.

Did our VMs fail to migrate due to fencing miss-configuration, or did we miss a step?
 
  • Like
Reactions: AlexLup
When a node fails, it's too late for the VM to be live migrated. Either the failed node is completely crashed, or it's self-fenced. In any case, the HA stack will reboot the VM on one of the remaining nodes
 
Hi
When the node failed, the VMs being hosted simply froze and went completely unresponsive. We were hoping that there would be a live migration to another node and there would be no (or minimal) interruption.

What you describe is not HA it called COLO in qemu or Fault Tolerance.
This feature is very Hardware costly and not implemented yet.
HA always restart the VM on the new node.
And it is also normal that this takes time and will not be instant.
 
Hi


What you describe is not HA it called COLO in qemu or Fault Tolerance.
This feature is very Hardware costly and not implemented yet.
HA always restart the VM on the new node.
And it is also normal that this takes time and will not be instant.

Thank you for the info, I appreciate the quick response.
The company I work for is interested in purchasing Proxmox, however they're going to want the COLO/Fault Tolerance feature. Is this something that will be brought to Proxmox in the near future? Or is there a way to do this currently in the CLI/Command Line?
 
Hi,
You can get a better result in proxmox in term of HA using a clasic scenario (let say)

Letcsay you have pmx cluster with some VM/Ct that you want to serve some HA services. Let take a example. I have a http server and I want to be highly disposable.
We will have for this at minimul 2 x VM with the same webserver. Much better to have 3x VM , 1 VM/ each pmx node. Each VM will have different IP address (ip1, ip2 and ip3). We need some tools: haproxy and ucarp installed on each node. haproxy is fail-over and/or load balancing tcp server who can proxy any incomming connection to any backend server (ip1,2,3). ucarp will create a cluter ip (vip) who can be migrat in less of 1 second from a fauly node to any other online node.
haproxy will listen for clients using vip, and will proxy redirect them to backend VM (if at least one of them is availabe). If let say the node who are using vip (only one), is broken (for whatever reason), then the ucarp daemon will (from any of others 2 nodes ) will make a takeover (sonit) and vip will be used. ucarp can run some script at vip up/down event. On a takeover vip up can restart haproxy and your http service is ok. All this process can be done in a few seconds.
A proxmox ha can take ten's seconds in the best case (15 sec in my own case versus 2 sec with ucarp/haproxy).
Also are many others useful things (haproxy/ucarp) like:
- if one of 3 http server is hacked, or misconfigure, this mostly do not affect the rest of them: in this case proxmox ha will not help you(you can remove the ip of the broken http service in haproxy or shutdown this vm)
- the total load of your client can be shared between all the online http servers, or using some ponders if you want(10% for the first one, 30% for the second...)
- better updates (pakages, configurations changes, and others) test - you can remove only one http server from haproxy then you can test as long as you need...without disturb any of your clients

Yes, you can ask what will be if you use in the same vm a database? You can use a cluster database system (like percona db cluster for mysql)

Also you maybe ask if this scenario can work in a 24/24 hours enviroment? Yes it works ... at least for me (5 years at least)

It is complicated? At the first sight it seems to be, bat when you are familiar with this tools and concepts is simple and also rock solid.

Good luck with H A ;)
 
Thanks for the info Guletz, but we'll be running pretty much our entire company within this cluster so it will be hard to duplicate every service with redundancies without bloating the cost due to licensing.
So in our Proxmox cluster we will have our Active Directory DC's (Windows at the moment, may be switching to Samba/LDAP), MS Exchange 2010 server, several internal websites and web services (linux based), ERP servers (SAP/Windows), and a reporting server (also Microsoft, but again, will be switched out within a year to linux based). If we try and cluster all of these in VM's, we're looking at several tens of thousands of dollars in MS licensing costs. We were hoping to use a fault tolerant setup instead of having to run multiple server instances.

I will begin looking into QEMU COLO. I'm hoping it will work well with Ceph as we're using that as the storage backend (separate from the Proxmox cluster).


AFIK is COLO not bugfree and so not production ready.
But keep in mind you have to mirror everything.
That means you need a dedicated Nic with min 40GBit per VM for the colo feature.
We're going to try this with a 10Gb dedicated connection. We're pretty sure this is going to be overkill as we were able to do this with VMWare using just 1Gb links without any performance hits. We just don't want to pay the ludicrous VMWare licensing costs anymore.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!