Sync public ip failover with proxmox HA

EuroDomenii · Feb 8, 2017

A) First problem is the triggering moment

A1) Monitoring via cron when a node changes status from online to fenced.

Doable with get /cluster/ha/status/manager_status either from outside website, either from script inside nodes ( the script should live on all nodes, but only triggered if the node has the master status in cluster, in order to avoid duplicate runs).

This approach has the advantage of moving all the failover ips at the beginning. Moving the failover ip has a latency of several minutes. With 50 vms, for first 10 won’t have the ip ready, but for the last 40, the public failover ips should already have been routed to the new node.

A2) How should I trigger scripts at the event when a node is fenced ? ( without need of cron). Or when a vm is migrated by ha manager to a new node?

In the above scenario, if the node is fenced for a little time, I should not move all ips from the beginning, because some vms won't be migrated, they would remain on the initial node.

B) Next step moving the ip

-I obtain the container ip get /nodes/{nodefenced}/lxc/{vmid}/config . What if meanwhile the container is restored to the next active node?

-If i could store the OVH Api service ( dedicated server) id in a comment on the proxmox node, I just run the OVH api client command to move ip to the new active server (it should be the next with higher priority online in HA group).

corroded · May 29, 2017

Hi EuroDomenili,

have you already found the way to move IP block between servers in case one of nodes goes down?

EuroDomenii · May 29, 2017

corroded said:
Hi EuroDomenili,

have you already found the way to move IP block between servers in case one of nodes goes down?

For the moment, this is a low priority task . I'll update, when it gets back on the radar.

guletz · May 29, 2017

In my opinion your ideea about how to move very fast any VM(many of them at least) from a fail node to another online node, AND maybe if the failing node is able to be online(in a decent time, so you do not want to move any VM before this max time is reach) is not the best

Maybe you can think at another stupid ideea, like this:

- let say NOW you have x1 VM on node n1, x2 Vm on n2 and so on
- so if n1 is down, you NEED to move x1 (maybe less) to let say node n7 (put your desired number)
- because x1 is big, the time to restore all this VM on n7 will be very long ... not so funny

Let do the same task like this:

- let make the same VM on 2 differents nodes, equaly distributed on the rest of the proxmox cluster, like VM 1 on node n1 and on node n2, VM 2 on node n2 and so on (or maybe all the VM on n4, if you want - depends of your landscape . . you will know what is better for you)
- so for each vm on n1 you will have 2 identical VM (one on n1 and the 2nd instance on another node)
- you will put in front of your clients haproxy for each pair of VM using the proper tcp ports (haproxy will listen for clients and he will routing the traffic in fail-over mode, with the active host on n1 is online and as backup for his pair hosted by let say n7)
- now when n1 is down for more then let say t1 (like 10 seconds or the desired timeout), all the clients that are using n1, will be redirected to the backup VM
- maybe you can consider to use instead of haproxy fail-over the load-balancing setup (in this case, only 50 % of your clients will be affected if n1 or n7 is down)

I omitted many details, so you can see the main ideea - it is more usefull to have any kind of fail-over/balancing at tcp level (with haproxy - need only to re-route your clients), compared with the case when you need to move a large number of VM on another node.

... sorry for my very bad English

, but I can explain in your native language (ro) ...

EuroDomenii · May 30, 2017

- because x1 is big, the time to restore all this VM on n7 will be very long ... not so funny

I noticed that you signed up with Proxmox forum recently ( one month and a half) and you already have plenty of activity on forum. This reminds me of the enthusiasm, when I have discovered this beautiful piece of software, last autumn

. I am not a veteran, but please allow me to gently remind you some points, regarding Proxmox HA.

First, we are not talking about restoring huge VMs to another node, when downtime occurs. Of course, this could take too long, it would not be HA anymore.

There are 2 use cases:

Either you have share storage ( Ceph) like https://pve.proxmox.com/wiki/High_Availability#_requirements and proxmox ha only migrates the configuration of X1 from n1 to n7.
Either you use remote incremental/ send receive like https://pve.proxmox.com/wiki/PVE-zsync, having a near continous backup, like in scenario https://forum.proxmox.com/threads/high-availability-without-san.11378/. We use an in house adapted version of https://github.com/digint/btrbk, with BTRFS storage (see https://forum.proxmox.com/threads/btrfs-experimental-storage-pve-4-4-13-testing-issues-patch.33896/ )

In my opinion your ideea about how to move very fast any VM(many of them at least) from a fail node to another online node, AND maybe if the failing node is able to be online(in a decent time, so you do not want to move any VM before this max time is reach) is not the best

There are 2 use cases here.

Clients with dedicated ip, for customers that prefers full customizabile VM (without HAproxy, maybe nginx acting as web server directly) .In this situation, my solution is not the best, but the only one. We must stay in the proxmox ha fencing logic and synchronize the failover public ips.

2 . Using HAproxy for multiple customers.

Your idea is nice, HAProxy is the best choice for load balancing and has free Advanced Health Checks, only available in Nginx Plus https://thehftguy.com/2016/10/03/haproxy-vs-nginx-why-you-should-never-use-nginx-for-load-balancing/

We recently played with https://github.com/imjoey/pyhaproxy/issues/6 , not being pleased with python parser haproxy performance. The principle is to parse the config file into objects/arrays , play with it, and render finally. When the config gets bigger, the performance suffers. So, we prefer to skip the parsing phase by keeping all the configuration in database and finally render with C, the fastest.

For practical reasons, let’s assume that our use case is OVH.
OVH Hybrid cloud rocks, linking via vrack private network dedicated servers with OVH cloud.
In your scenario, you assume that HAProxy is already in High Availability setup ( on OVH cloud), you don’t need to move any public ips. In case your dedicated backend VMs fails, you have to balance from haproxy PRIVATE ips in vrack.

This post is about moving PUBLIC failover ips. But, since you brought it up, is good to talk.

If your HaProxy VM is not on OVH public cloud, but in one of your Proxmox cluster nodes, you fallback to my initial use case: in case of failure, you need to move via API the PUBLIC ips from one node to another.

Regarding speed connectivity between HAProxy and proxmox VMs backends. With vrack you have 1Gbps, but if HAproxy is also Proxmox VM on the same node, LXC to LXC, iperf reports 32.3 Gbits/sec ?! Btw, anyone consider that vrack 1Gbps would be good enough for HAProxy to backend communication?

Let do the same task like this:
- now when n1 is down for more then let say t1 (like 10 seconds or the desired timeout), all the clients that are using n1, will be redirected to the backup VM

Further, there are challenges to match fencing logic of Proxmox with the load balancing logic of HAproxy, when the backends are not responsive. If we have share storage, we are fine. But with incremental send/receive storage ( ZFS, BTRFS), the nofailback behaviour of proxmox ha must be backported to haproxy, and resyncronize later.

Finally, for certain use cases, the idea to sync the proxmox ha logic with haproxy load balancing is nice.

guletz · May 30, 2017

EuroDomenii said:
If your HaProxy VM is not on OVH public cloud, but in one of your Proxmox cluster nodes, you fallback to my initial use case: in case of failure, you need to move via API the PUBLIC ips from one node to another

Ucarp can be used on any number of hosts. Then run haproxy on ucarp vip (you can run scripts on vip up and down)

Search

Search

Sync public ip failover with proxmox HA

EuroDomenii

Well-Known Member

corroded

New Member

EuroDomenii

Well-Known Member

guletz

Famous Member

EuroDomenii

Well-Known Member

guletz

Famous Member