3-nodes HA cluster and enterprise shared storage: mid 2022 considerations - need advice

MarvinFS

New Member
Jun 22, 2022
5
0
1
Hey people,

Could you kindly clarify few things regarding enterprise like virtualization and proxmox.
I'm originally from a VMWARE camp, but experimenting now with migration to Proxmox, as we are non-profit organization and just don't have budgets.
So, traditionally, for redundant HA virtualization I'd deploy 3 hosts in a cluster and 2 DAS/SAS (or FC-SAN, or 10G iSCSI - doesn't really matter) fully shared physical storage appliances - one for vms and data, second with deduplication possibly, solely for backups (veeam with direct SAN transport backup path I used all the time)
I've read many threads here and a watched lots of youtube videos regarding the topic.

Correct me if I'm wrong with the following:

1. As there are no readily avail cluster aware FS solution here (similar to VMFS) and also as ZFS doesn't really work well in a cluster shared storage case with snapshots and all the whistles, I assume block storage solution is not really an option for us and the only feasible solution right here right now is Proxmox Cluster File System (pmxcfs) for VM configs and a NFS cluster shared storage with qcow2 file based data disks (most probably just TrueNAS NFS share). Am I correct?
May be there are "more better" solutions out there now in mid 2022. It will allow me to use NFS backend on primary storage as a datastore with immediately avail snapshots, VM migration, availability for HA managed VMs and easy trouble free backups. (back ups embedded in Proxmox)

2. In such case what about FC-SAN or SAS DAS? As here it doesn't really makes sense, like LVM thin doesn't work either nor ZFS. Am I correct?

3. As for the NFS redundancy (the way I see it) I assume it absolutely required to implement link aggregation from storage to hosts with obviously isolated storage LAN.

4. PBS backup server - any particular idea if I need it implemented other than tape library support? I mean for our small infrastructure, internal backup schedules on cluster level, which comes out of the box, are totally enough... regarding pruning, well, I'll figure something out as that's plain files there.

5. I enjoy reading @bbgeek17 posts and wonder if I want\need Blockbridge storage driver for Proxmox with my 3rd party storage appliances. Not sure if guys have some level of community version. Pricing for commercial version is a question also.

6. Any gotchas or tips I can refer to with implementing NFS data backend in production? Am I missing something with such setup? Like it feels too good honestly, any drawbacks with that? We have about 40VMs mixed: windows, AD, file servers, linux servers, MS SQL, ms exchange with 9TB storage. Plan is to migrate to open-source completely so we'll drop everything Microsoft based except client machines.

7. in some threads here people reported poor 10G performance in conjunction with Proxmox-TrueNAS, like close to 5Gbs, anyone have fresh experience with that? like here https://forum.proxmox.com/threads/prox-truenas-10g-connection-sharing-results.110202/

Sorry for such a long list, I hope it will be useful to have it in one place in case people have similar cases.

Regards,
MarvinFS
 
Last edited:
1. As there are no readily avail cluster aware FS solution here (similar to VMFS) and also as ZFS doesn't really work well in a cluster shared storage case with snapshots and all the whistles, I assume block storage solution is not really an option for us and the only feasible solution right here right now is Proxmox Cluster File System (pmxcfs) for VM configs and a NFS cluster shared storage with qcow2 file based data disks (most probably just TrueNAS NFS share). Am I correct?
Hi @MarvinFS - you do have a few DIY options, ie OCFS2, but it requires advanced expertise and comes with no support from PVE folks. You will always use pmxcfs for the configuration availability - thats a hard requirement. And, of course, NFS is always a semi-safe fallback but a lot depends on the NFS implementation - they are not all equal.
May be there are "more better" solutions out there now in mid 2022. It will allow me to use NFS backend on primary storage as a datastore with immediately avail snapshots, VM migration, availability for HA managed VMs and easy trouble free backups. (back ups embedded in Proxmox)
There are many NFS storage providers out there, from complete DYI to commercial offerings that come with support. The critical thing to keep in mind that a solo NFS server becomes your single point of failure. So if you are building a highly available environment each piece of the puzzle needs to be able to provide HA: PVE, network, storage.
As for snapshots, you are limited to either QCOW on NFS with QCOW snapshots, or out-of-band backend storage Snapshots without application/vm integration.
2. In such case what about FC-SAN or SAS DAS? As here it doesn't really makes sense, like LVM thin doesn't work either nor ZFS. Am I correct?
You can only use shared storage if the layer above it knows that multiple clients can read/write the same blocks at the same time. In the case of LVM - space must be pre-allocated, ie thick, so this prevents snapshot use. Its certainly possible to use SAN and SAS-DAS, however you are going to build and manage your own layer above it.
3. As for the NFS redundancy (the way I see it) I assume it absolutely required to implement link aggregation from storage to hosts with obviously isolated storage LAN.
Network ports are relatively cheap now days. There is really no excuse to not implement some sort of link redundancy.
4. PBS backup server - any particular idea if I need it implemented other than tape library support? I mean for our small infrastructure, internal backup schedules on cluster level, which comes out of the box, are totally enough... regarding pruning, well, I'll figure something out as that's plain files there.
I have no strong opinion on this. You could use either tape or NFS, or both, and replicate to remote site for an off-site copy. This really depends on your SLA.
5. I enjoy reading @bbgeek17 posts and wonder if I want\need Blockbridge storage driver for Proxmox with my 3rd party storage appliances. Not sure if guys have some level of community version. Pricing for commercial version is a question also.
Thank you! Blockbridge Proxmox plugin/driver is built to communicate over API to Blockbridge Storage Software that is running on dedicated off-the-shelf servers. These servers could be attached to FC/SAS - 3rd,4th and 5th item here: https://www.blockbridge.com/architectures/ .
The platforms we recommend today are all-NVMe modern Dell/SuperMicro servers with Micron disks - the "Modular Ethernet Cluster" model.
For pricing please reach out directly.

6. Any gotchas or tips I can refer to with implementing NFS data backend in production? Like it feels too good honestly, any drawbacks with that? We have about 40VMs mixed: windows, AD, file servers, linux servers, MS SQL, ms exchange with 9TB storage. Plan is to migrate to open-source completely so we'll drop everything Microsoft based except client machines.
The main thing to keep in mind is latency and ability to persist NFS DRC on failover to prevent client outage. Each vendor provides its own tunings and if you go with TrueNAS - their support would be best to determine those. MS Exchange in your list jumps at me - I do not believe MS supports NFS as Exchange storage. It may work, but you know what will happen the moment you need to open a case: please use supported storage configuration.

If you were to use Blockbridge, I would actually advise to connect Exchange and SQL _directly_ to iSCSI.
7. in some threads here people reported poor 10G performance in conjunction with Proxmox-TrueNAS, like close to 5Gbs, anyone have fresh experience with that? like here https://forum.proxmox.com/threads/prox-truenas-10g-connection-sharing-results.110202/
Performance expectations and troubleshooting are extremely dependent on environment. It should always start with basics: is iperf between server and host ok? is basic nfs performance ok between server and host, ie before virtualization. There is no objective reason to expect 5Gbs performance for above setup in all cases.

Good luck!


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I'm wondering that ceph wasn't mentioned as an option for a HA cluster.
Atleast that would remove the single point of failure of the storage.
 
I'm wondering that ceph wasn't mentioned as an option for a HA cluster.
Atleast that would remove the single point of failure of the storage.
I was mainly addressing the starting topic which questioned NAS and SAN. But you are absolutely right - Ceph is another option and the biggest advantage is full integration and support from PVE end-to-end. In 99%, however, existing FC/SAS would not be a good fit for backing Ceph.
The disadvantage: I suspect Microsoft would not look any more favorably at Ceph as backing storage for MS Exchange and MS SQL, than NFS.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thank you, Sir for the detailed answer, It is very nice to see that I was right with my assumptions on major points.

Thank you! Blockbridge Proxmox plugin/driver is built to communicate over API to Blockbridge Storage Software that is running on dedicated off-the-shelf servers. These servers could be attached to FC/SAS - 3rd,4th and 5th item here: https://www.blockbridge.com/architectures/ .
The platforms we recommend today are all-NVMe modern Dell/SuperMicro servers with Micron disks - the "Modular Ethernet Cluster" model.
For pricing please reach out directly.
Oh, I though it is either hardware dependent using your OS and appliances OR just independent advanced driver which just improves throughput in proxmox. but yeah I forgot it has data lane and API control. Anyhow all-flash hardware listed on your website is absolutely sick, my wet dreams.

The main thing to keep in mind is latency and ability to persist NFS DRC on failover to prevent client outage. Each vendor provides its own tunings and if you go with TrueNAS - their support would be best to determine those. MS Exchange in your list jumps at me - I do not believe MS supports NFS as Exchange storage. It may work, but you know what will happen the moment you need to open a case: please use supported storage configuration.
No Microsoft definitely doesn't not support NFS, although probably it is already changed with latest server 2022 - haven't checked, anywho I meant NFS just a shared vm data disk storage with qcow, and inside obviously you may deploy any FS you want NTFS in that case.

I was mainly addressing the starting topic which questioned NAS and SAN. But you are absolutely right - Ceph is another option and the biggest advantage is full integration and support from PVE end-to-end. In 99%, however, existing FC/SAS would not be a good fit for backing Ceph.
The disadvantage: I suspect Microsoft would not look any more favorably at Ceph as backing storage for MS Exchange and MS SQL, than NFS.
Yeah CEPH looks sophisticated on paper, and I took a look there also, but when dug deeper to configs and maintenance... not my case, not for small private infrastructure.

If you were to use Blockbridge, I would actually advise to connect Exchange and SQL _directly_ to iSCSI.
aha! that is probably very good option. I'll take a look on possible all-flash budget options. As far as I understand your solution is hardware independent. Exchange will definitely be blazing fast in such scenario.

Thank you all for support and comments. Nice to have such experienced community here, happy to join.

Regards,
MarvinFS
 
I'm wondering that ceph wasn't mentioned as an option for a HA cluster.
Atleast that would remove the single point of failure of the storage.
Single point of failure on a DIY storage is definitely not NFS :) it's most probably controller or the backplane (In my carrier I experienced that once with supermicro) or PSU for example. But I totally agree, when going production you need to make sure all nodes are redundant. BTW haven't yet checked what will happen in case of link failover with NFS - need to check that... probably nothing bad as data streams are TCP basically...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!