Howto setup a spare node for PVE Cluster HA

joelserrano

Renowned Member
Mar 16, 2011
33
0
71
Hi,

We have a 5 node cluster and we are planning to buy a 6th server as a spare server and enable HA.

How does PVE decide on which node to start the VMs when a working member fails? Is it depending on free resources? Or on number of VMs? I mean, whats the "formula"? :)

Also, If I want all VMs to have HA, do I have to go one by one or is there an "enable HA on all VMs" sort of thing? (Maybe with the API or pvesh).

Thanks in advanced.

Best regards,
Joel.
 
There is no such thing as a spare server. Either the server participates in the cluster or not. Since Proxmox misses load balance features then vm's will be moved randomly to other servers.

https://access.redhat.com/site/docu...min-manage-ha-services-operations-cli-CA.html
[TABLE="class: lt-4-cols gt-7-rows, width: 100%"]
[TR]
[TD]Relocate[/TD]
[TD]Move the service to another node. Optionally, you may specify a preferred node to receive the service, but the inability of the service to run on that host (for example, if the service fails to start or the host is offline) does not prevent relocation, and another node is chosen. rgmanager attempts to start the service on every permissible node in the cluster. If no permissible target node in the cluster successfully starts the service, the relocation fails and the service is attempted to be restarted on the original owner. If the original owner cannot restart the service, the service is placed in the stopped state.[/TD]
[TD]clusvcadm -r <service_name>or clusvcadm -r <service_name> -m <member>(Using the -m option specifies the preferred target member on which to start the service.)[/TD]
[/TR]
[/TABLE]

Rgmanager uses relocate to move away vm's from failing nodes.

You can somehow control relocation using static configuration through fail-over domains in the cluster.conf file.
 
I'm going to have a look at that doc and then try the cluster.conf failover domain settings.

Thanks for the info mir!

I'll get back with my results :)

Best regards,
Joel.
 
From what I know there are no checks what so ever. What rgmanager does is to ask a node to start a specific VM. If it starts all is well if it fails rgmanager will try starting the VM on another node. This will continue until all nodes in the cluster have been visited.
 
Hi,

After checking what "mir" stated about failover domains, I have done some searching and got to this link:

https://fedorahosted.org/cluster/wiki/FailoverDomains

It explains perfectly how failover domains work and you have different settings for different scenarios, personally I think It covers most situations, at least the one I was looking for is solved easily, I is just a matter of deciding how I want the "spare" nodes to work.

Thank you everybody.

Best regards,
Joel.