Design options for a 2-node cluster in production

Local+replicated ZFS will be the cheapest and give reasonable performance, but does not provide realtime hands-off HA. The storage does not scale beyond 1 box. Replication is done on an ad-hoc basis and not quite in realtime.

SAN will not be cheaper than local ZFS, and it will not be faster (all else being equal). Depending on the storage vendor's feature set you may also lose thin provisioning and snapshot capability unless explicitly provided. Additionally, it would be a mistake to assume that you get any redundancy from a single SAN enclosure. Yes, central shared storage can survive a host failure, but what about a storage or switch failure? Putting all your storage in one expensive basket does not change your situation much from where it is right now.

A good SAN solution would involve 2 or more replicated enclosures and LACP networking with a redundant switch stack. Somebody stop me if I'm off base, but the cost and overhead of SAN relative to the performance and reliability you get in return is extremely high and as a result I do not believe SAN deployments are particularly widely used in small-medium enterprise Proxmox environments. There are some very badass SAN products out there, with higher specs than my VM hosts, so I have to wonder about their price and the bottom line relative value to the small-medium end customer.
A good SAN has redundant controllers and redundant switch fabrics where each controller connects to both. You can do LACP, but IMHO better to do multipath. Typically you have a shared chassis, but the compute and network modules are fully independent, redundant power, etc... You save in storage because you don't have to replicate between the nodes doubling capacity and only need the RAID level once. You can get really good performance at a low cost with an all flash SSD SAN. You can get a fully redundant all flash SANs with 8x25GBE ports starting under $20k. If you have a DR site, then you can do synchronous replication to that. A dedicated SAN doesn't take away RAM and compute resources in providing storage to the cluster that CEPH does, and CEPH also has significantly lower performance.

Given that a SAN provides realtime HA, I recommend that over a local+replicated ZFS.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!