I am looking to design a production HA Proxmox VE cluster in a small company.  So costs are the major constraint.  Cluster will run production VMs (including DNS/DHCP) and development VMs.  Cluster will use manually assigned IPs
Setups I am considering so far, if I missed something, please suggest that, but consider the cost limitation.
Option A: 2-node + QDevice with Ceph
				
			Setups I am considering so far, if I missed something, please suggest that, but consider the cost limitation.
Option A: 2-node + QDevice with Ceph
- The 2 main nodes will be new servers with lots of RAM and threads, plus a few hot-plug disks, separate high speed (10GbE) between nodes. QDevice maybe on a slower net.
 - Provides automatic failover of compute and storage
 - Near instant replication of virtual disks as writes are passed through to the replica on the other node
 - Unclear how to set up a QDevice for both Proxmox itself and Ceph by just installing software on a less powerful Debian node
 - Rumors that Ceph will insist on keeping 3 copies of everything instead of the 2 in RAID 1
 - Rumors that Ceph will do massive amounts of unneeded data copying on the other node when one of the nodes is taken offline
 - Unclear how this deals with a full power outage taking out both nodes at near the same time as the UPS runs empty.
 
- Same hardware as option A
 - Provides automatic failover of compute, maybe storage too
 - Delayed replication of virtual disks, causing failed over VMs to revert to older data.
 - QDevice apparently needs only deal with Proxmox coresync, no extra work for ZFS Replication.
 - Unclear if ZFS replication keeps 2 or 4 copies of data (1 or 2 per node).
 - Hopefully will have less complicated reaction to a full power outage taking out both nodes at near the same time as the UPS runs empty.
 
- Similar hardware to the 2-node options but with less memory and higher total cost.
 - Provides automatic failover of computer and storage
 - Near instant replication of virtual disks as writes are passed through to the replica on other nodes
 - Rumors that Ceph will insist on keeping 3 copies of everything instead of the 2 in RAID 1
 - Rumors that Ceph will do massive amounts of unneeded data copying on the other nodes when one of the nodes is taken offline
 - Unclear how this deals with a full power outage taking out all/most nodes at near the same time as the UPS runs empty.
 - More expensive due to the extra node and need for a 10GbE switch to connect 3 nodes on each backend net.
 
- Same hardware as option C
 - Provides automatic failover of compute, maybe storage too
 - Delayed replication of virtual disks, causing failed over VMs to revert to older data.
 - Unclear if ZFS replication keeps 2 or 4 copies of data (1 or 2 per node).
 - Hopefully will have less complicated reaction to a full power outage taking out all/most nodes at near the same time as the UPS runs empty.
 - More expensive due to the extra node and need for a 10GbE switch to connect 3 nodes on each backend net.
 
- Same hardware as option A/B, but without the hot-plug disks Pplus a 3rd party HA SAN storage solution and hardware.
 - Provides automatic failover of compute, shared access to HA storage via SCSI locking on SAN or Proxmox coordination of access.
 - Maybe the QDevice can run on the SAN hardware, maybe on some other Debian server.
 - QDevice apparently needs only deal with Proxmox coresync, no extra work for NAS HA.
 - Hopefully will have less complicated reaction to a full power outage taking out both nodes at near the same time as the UPS runs empty.
 - More expensive due to the extra SAN solution and potential need for a 10GbE switch to connect 2 nodes to SAN.