getting a little more redundancy

A

ansanto

Guest
getting a bit of HA

You could get a little more redundancy by using a second ssh tunnel inside DRBD interfaces and a couple of additional shall scripts to clone and (start locally) the VMs hosted by the peer: it will be usefull when the peer goes down or when it is no more reachable via PVE link.
Try to clarify the idea, hope my english-fu be good and apologize for the loooong post.

Assume that N1 and N2 are the two nodes of a generic PVE-DRBD cluster, no matter who is the master. Both the nodes use one dedicated ethernet port for DRBD connection (DRBD link) and a second ethernet port for both cluster and oustide connections (PVE link). Both the nodes may be connected to separate switches ('SW1' and 'SW2') or to the same single switch 'SW'. We also know some hosts ("outside" IPs) located beyond the router 'RR' which provides the default route to the cluster. Both the nodes run their own guest VMs.

As a preliminary step, you have to run in both the nodes the script 'drbd-ssh-key-exchange' which will allow us to use a second channel (DRBD ssh tunnel, DST) other than the one available by the cluster (PVE ssh tunnel,PST).

The script 'clone-vm', invoked by crontab in both the nodes, copies the VM definition files ???.conf from the 'local' dir /etc/qemu-server to the 'remote' dir /etc/un-qemeu-server (the freezer) so that each node knows the configuration files of the VMs running in the other node and keeps them ready-to-start in its freezer. The copies are performed through 'scp' istances with the ability to chose the correct tunnel to reach the peer, 'clone-vm' uses PST or DST tunnel according to their state.
Beside that, our main concern is the health of the PVE link. We have to monitor if the VMs hosted by the underlaying node are seen outside the cluster (lack of DRBD connections will lead to a split-brain situation which will be managed by DRBD handlers and not discussed here).

That said, let's suppose that the cluster goes wrong in a not_synced state (the broken-tunnel event may be triggered by an icmp-based daemon or another monitoring tool (or a nagios plugin). More precisely: node N1 can not successfully ping the node N2 but - at the same time - node N1 successfully pings the ouside IPs located beyond the router RR.

Since the above ping attempts use PVE ports and PVE IP addresses, node N1 is then authorized to think that the VMs hosted by node N2 are no more seen from outside: infact the failed ping attempts use the paths:
a) 'N1-SW-N2' in case of a single switch
b) 'N1-SW1-RR-SW2-N2' in case the nodes are connected to separate switches
so:
node N2 may be down/hung/locked-up OR PVE ethernep port is down OR PVE ethernet cable is broken OR... in path a)
node N2 may be down/hung/locked-up OR something is broken along the path b)
...anyawy - as said - the VMs hosted by node N2 are definitely un-seen from outside.

The script 'start-cloned-vm' is then triggered in both the nodes by the broken-tunnel event: let's see what happens in both N1 and N2 nodes (behaviors are simultaneous).

*inside Node N1*
Node N1 checks if DST tunnel (previously created by 'drbd-ssh-key-exchange') is available in order to detect if the node N2 is still alive, please note that N2 may be 'up & running' but be un-reachable due to PVE ethernet failure as an unplugged cable.
If DST tunnel works then N1 will inject a qmigrate command to node N2 just using the DRBD ssh tunnel: this way all the VMs hosted by the un-reachable node N2 will be safe and available via N1.
In case DST tunnel also is down (N2 seems really dead) then node N1 will copy the configuration files of the VMs hosted in the dead node N2 (previously frozen by 'clone-vm') from its 'freezer' to /etc/qemu-server and will start them locally. As above, the VMs hosted by the un-reachable node N2 will be safe and available via N1.

*inside Node N2*
- If the node is down or hung or OS doesn't work: well, the script 'start-cloned-vm' simply wont run, nothing happens, nothing to say (obviuously).
- If the node is up and DST tunnel works: all its VMs have been (q)migrated to node N1 so in case PVE link returns available we'll have no problems indeed.
- If the node is up and DST tunnel is down: node knows that all its VMs (previously cloned) have been started in node N1, so it simply stops all its VMs and run a shutdown -h immediate (suicide). Maybe a simple stonith would be even better: node N2 would be killed by N1.

The script 'start-cloned-vm' acts as something like an handler for PVE link failures, it can be invoked manually, by a Nagios plugin or triggered by PVE-not-in-sync event or by hotplug2; otherwise it may be daemonized or scheduled in a crontab file. It also sends email warnings to a configurable recipient.

When the the cluster returns back to its normal operation , i.e. all the links work, you will have definitely to check DRBD disks and fix some simple situations.

All the scripts are in a very initial state and the code must be cleaned and surely improved; anyway the stuff works, at least in my tests, and is available from:
http://subversion.assembla.com/svn/proxmox-drbd/

I invite you to test and debug the script, add improvments or whatelse you think be right. As usual, comments and tips are wellcome.

Thanks for your patience in reading the post.
 
Last edited by a moderator:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!