Suggestions on 2 node HA

totalimpact

Renowned Member
Dec 12, 2010
132
18
83
Or 3 nodes?.... background:
Current setup:
2 PVE 1.7 hosts (no cluster)
Dell R510 2x 80gb Windows VM + 1x 80gb CentOS VM (not ct)
Supermicro server: 2x 30GB Centos CT,
Installed in same rack, no HA.

Will be upgrading these to a pair of Dell R730, and want to setup 1 in another building over a 10gb fiber link. I would like to setup some sort of cluster fs, like CEPH or DRBD. Have been testing new 4.2, and would really like to use LVM-t.

Goals:
- have live data in 2 buildings
- live migration not needed, VMs can shutdown to migrate.
- failure recovery in under 20 minutes
- snapshot option would be ideal (LVM-t or qcow)

I have done it in the past on 2x servers on DRBD. I know DRBD is a mess, as far as getting out of sync, quorum and fencing. I only have the budget for 2 new servers, so I could re-purpose the R510 to be a fencing node, or CEPH monitor if needed. Not familiar with CEPH, so dont know its drawbacks.

My dirty option is to run my VMs on normal local drives using standard PVE install on LVM-t, and send backups to a drive on the remote machine, I can restore an 80gb vzdump in about 40-50 minutes on one of these machines.
 
if you only need "cold standby", you could use zfs + pve-zsync to replicate the volumes to both nodes, and do a manual fail-over. HA with two nodes is not possible, ceph with two nodes is also not possible (at least not in a sane way). with a third quorum node you can achieve HA, but without a shared storage you don't really benefit from HA.
 
Simply put - you're not going to get HA from 2 nodes.

What fabian suggests is what I'm using, as I've got a similar situation as yours. No need to restore from a backup - just start the already existing VM on the other server. Very easy and quick recovery with a cold standby. The downtime is limited to me being notified, logging into the web interface of the backup server, and starting the needed VM. Around 5 minutes, depending on the services that the VM needs to run.

My backups are stored on a NAS, but I found using this NAS for live (shared) data just wasn't giving good results in my situation without 10gb fiber connection. So I have to use local storage on each blade. Not the end of the world, but it did limit my choices on redundancy and HA. I'm glad that the pve-zsync is available and is quite easy to setup and work with.
 
I figured 3 would be needed.... and I could do that using my retired R510, but I wouldnt want to rely on it for storage, and am trying to avoid some sort of expensive NAS/SAN in the mix.

So it sounds like zsync is what I need to keep my data "near realtime" on both boxes, and easily spin back up on the other box. I read the wiki, and see it can be set for 15 minute syncs, I assume this is just syncing diffs? How is snapshotting on that zsync? Fast/easy in GUI? You run qcows on top of the ZFS or raw? Any downsides to this other than no live migration (dont care)?

On a side note - you should look at Nortel/Avaya 5530 or 4524 switches, since Avaya took over, that switch can be had for $100~200 (ebay), 24 port 1gb + 4x 10gb sfp or xfp on 4524. L3 managed, DC backup power, and firmware still currently updated by Avaya. They stack with other switches in the same family, such as 4850 48 port 1gb POE+ on dedicated stacking ports with 80gb uplinks... i dont work for Nortel ;)

https://www.avaya.com/usa/documents/avaya-ethernet-routing-switch-4500-series-dn4816.pdf
https://www.avaya.com/usa/documents/avaya_ethernet_routing_switch_5000_dn5098.pdf
 
Yea, having a fiber NAS/SAS was just not going to happen for me because of my university's unwillingness to run fibre in buildings, but between buildings, yes. Some use cases would be fine with a 1gb Ethernet connection, but not for a busy databases:confused:

The 15 minutes is the default time. You can set it to whatever you want by editing the Cron job(s). I thought about lowering the time for my case, but didn't want syncs to overlap due to not being able to push data fast enough through the network. 15 minutes is an acceptable loss for us. The databases are backing up to a local drive every 5 minutes, so it's even less with some VMs.

I'm aware of Avaya's offerings - my previous employer was a VAR for them. Very good equipment at a reasonable price. I wish I could install my own equipment like this, but the University gets testy about such things :(

EDIT ADD - PVE-sync only works with VMs and not the entire drive/storage, so it wouldn't work with snapshots/backups in that way. My snapshots are being sent to a different storage, a single point NAS where speed isn't an issue. The GUI does provide a very easy interface to get this up and going.
 
Cool... Im going to try to light up a test lab on a couple optiplex with SSD, is that possible on a single drive? I just pick the ZFS option on the installer- right? Or does it need separate storage media for the zsync directory?
 
So I got a pair of servers loaded up, the only thing I cant piece together is config sync, in the wiki it seems to imply vm confs are in
/var/lib/pve-zsync/ but it appears that is not setup automatically, and I am not sure how to set that up as the config location. What are you using to keep your configs in sync?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!