What is the best way to handle High Availability between nodes with different storage setups?

kocherjj · Dec 3, 2022

I have read through the documentation and I'm not sure I'm following it well because I'm not even sure if what I want to accomplish is possible with Proxmox or not so I thought I'd ask here.

I have a Proxmox node setup and hosting several containers and VMs. This started as a hobby and chance to learn, and as I have experimented with various services friends and family have started using them and I'm starting to realize that I need to increase my reliability before something catastrophic happens.

My main node has a 6TB ZFS pool made up of three mirrored sets of 2TB M.2 nvme drives. This setup already saved my hide once when one of the drives failed and I was able to get it replaced under warranty and the pool rebuilt without any hitch in operation. However, I decided I needed something more so I setup two more lower powered nodes, each with a 6TB spinning disc, and created a cluster to join them to my primary node. I experimented with a few old laptops first and was able to get VMs and containers to work with High Availability using ceph, testing by the simple expedient of disconnecting network cables from nodes and watching what happened. However, now that I have this system setup it appears that it won't be so simple.

With my main array being ZFS, is ceph even an option? If it is, is it the best option?

My ultimate goal is to have live copies of several guest containers and VMs present on the 6TB drive on both node2 and node3, the idea being that if Node1 goes down for some reason, or even if I want to test a Proxmox update on one node before pushing it to all of them, then the guests will quickly restore on either one or the other of node2 and node3, or better yet balance between them, and when node1 comes back up the services will be migrated back to the fast hardware again. I don't mind doing this last step manually if needed but would prefer automation.

What would others here recommend that I do to achieve this with the hardware I have to work with, or did I go about this all wrong?

Thanks,

LnxBil · Dec 3, 2022

kocherjj said:
With my main array being ZFS, is ceph even an option?

No.

kocherjj said:
What would others here recommend that I do to achieve this with the hardware I have to work with, or did I go about this all wrong?

High availability is not "quickly restore". High availablity is only possible if you have a (dedicated or distributed) shared storage model and I count "the same data", not restored somehow. This can only be archived with CEPH, glusterfs, gfs2 in you would use distributed shared storage, so every node HAS the same data. As with most cluster and high availability solutions, you would need 3 nodes (or 2 and a quorum device). Any distributed storage system has the problem that write performance is always (if you want consistent writes) limited by the slowest storage. Having NVMe in one and a harddisk in the other will EXTREMELY limit your write throughput and time. Another way would be to have a dedicated shared storage outside of your PVE cluster that is mounted as NFS/CIFS or via iSCSI (or FC).

Without a shared storage, you will not be able to do quick live migrations or have any fast failover and most certainly data loss in case of a node failure. Such a setup is extremely hard (and expensive) to get at home. What most users do is to have a non-high-availability system with asynchronous ZFS replication and do non-quick live-migrations over the replication to the other node in case of a switchover (planned) but you will lose data in a failover scenario (unplanned) due to the asynchronous replication.

kocherjj · Dec 4, 2022

Thank you for the response,

At this point HA reliability is more important than NVMe speeds. It was fun to setup the NVMe array and get it working smoothly but I honestly don't think my guests are really seeing a significant benefit from the extra speed due to other bottlenecks and I could probably find better usage for those 2TB NVMe drives as laptop upgrades. You mentioned that High Availability is not "quickly restore" but in my experimentation with CEPH it still wasn't a real-time failover when one node went down. It took a few minutes to "quickly restore" and if that is not what is meant in the Proxmox documentation by "High Availability" then I must have mis-configured something. I have read extensively through the documentation and numerous tutorials but a lot of assumptions seem to be made about configuration that is not explained.

If I approach this differently, forgetting everything I have currently done, and setup my 3 Proxmox nodes with a 6TB HDD in each available exclusively for guest VMs and CTs, how would I configure distributed shared storage for High Availability of guests with real-time failover? I am comfortable working on the command line but would prefer to stick with setup that is native to the Proxmox GUI for consistency and process documentation.

LnxBil · Dec 5, 2022

kocherjj said:
You mentioned that High Availability is not "quickly restore" but in my experimentation with CEPH it still wasn't a real-time failover when one node went down.

Maybe I explained it not good enough, yet you have distributed shared storage with CEPH. Therefore, the failover is instantaneous without any "on disk" data loss. You will however loose uncommitted (or more precisely unflushed) content. This IS the kind of high availability you want and the ONLY SUPPORTED multi-node distributed shared storage system. Before CEPH, there was DRBD, but that was dropped a few PVE versions ago.

What is the best way to handle High Availability between nodes with different storage setups?

kocherjj

Member

LnxBil

Distinguished Member

kocherjj

Member

LnxBil

Distinguished Member

We value your privacy