I am building a cluster with 2 PVE hosts and a quorum host. Each PVE host is running local storage with hardware raid. I would like have it set up in such a way that in the event of a node failure, all VMs running on one host can be restarted on another. Ideally with clustering, but replication with a 5-10 minute rollback is acceptable. Here are the following ideas I've had and their problems:
- Replication over ZFS
Problem: ZFS and hardware raid do no go well together. It's not recommended for many reasons, including the official documentation.
- Clustered hosts with CEPH storage
Problem: It doesn't look like I can do CEPH shared storage by using the quorum host as a MON in the same way I can with PVE clustering.
- Manual replication with something like rsync
This could work, but there is no way (I know of) to automatically restart the VMs upon the loss of a host.
Let me know if there are any options given the hardware limitations.
- Replication over ZFS
Problem: ZFS and hardware raid do no go well together. It's not recommended for many reasons, including the official documentation.
- Clustered hosts with CEPH storage
Problem: It doesn't look like I can do CEPH shared storage by using the quorum host as a MON in the same way I can with PVE clustering.
- Manual replication with something like rsync
This could work, but there is no way (I know of) to automatically restart the VMs upon the loss of a host.
Let me know if there are any options given the hardware limitations.