Storage types and replication , NFS and local ZFS in a cluster

FcT

Active Member
Jul 31, 2019
6
1
43
45
Hi Proxmox community,

I am currently running a 3 nodes cluster and all my VMs disks are stored on NFS shares provided by another physical server. As that NFS server is a SPOF, I would like to replicate the VM and their disks on my third node :
pve1 and pve2 have access to the nfs shares named SAS & SATA.
pve3 does not use the nfs shares, it has local disks and two ZFS pools named SAS & SATA (both created before joining the cluster), that will be used as a zfs replication target.

The gui (datacenter => storage => add => ZFS) let me browse to the local ZFS pools on pve3, but fails to create the storage with the same ID (SAS or SATA required by the replication, which makes sens). Is there a workaround (ZFS over iSCSI instead of NFS ?) to use theses local disks as an acceptable replication target ?

How would you improve the SPOF created by the nfs server, or how could I use my third server as a Business Continuity Plan ?

Thank you all,
FcT
 
it has local disks and two ZFS pools named SAS & SATA (both created before joining the cluster), that will be used as a zfs replication target.
replicated to what? You need a ZFS source and destination pool. Your NFS cannot be it, so you have only one pool on pve3.


How would you improve the SPOF created by the nfs server, or how could I use my third server as a Business Continuity Plan ?
I don't see a way with the hardware you have described. I would use a proper HA NFS solution or a SAN (dedicated as in a box with two controllers or distributed like ceph, starwind or drbd). You could also use ZFS replication inside of your cluster, but that is a PITA for me to setup and maintain and you would need the storage inside of the nodes, better to just go directly to a 3-node CEPH cluster, which is the minimum and would yield a real cluster where any component could fail without interrupting the overall PVE cluster.
 
Thank you for the clarification MnxBil, I wrongly thought that because the underlying file system of my nfs shares was ZFS, it might have been possible.
I now understand that I have nothing to replicate to pve3. I do come from the road you described : ZFS replication inside the cluster with local disks and pools in each server. It worked great for my environment, but was really a waste of disk space in the end.
Proper ha nfs or SAN is out of reach for my organisation, I will try to find a migration path to CEPH in a near future.

Thanks again for your insights and expertise !
FcT
 
To chime in further, 3 node Ceph clusters are inherently fragile. You should seriously consider the limitations before implementing this solution as it could end up being a headache if you are not adequately prepared.

To play devil’s advocate, do you truly need the 3 node environment? Could you not get by with a traditional ZFS replication setup (like you are used to) between two of the servers? The third server could then be a qdevice or a glorified quorum voting PVE instance.

If in the end you are truly serious about a three node Ceph cluster and want/need to save a buck on switching gear, or even to arguably reduce complexity, you could leverage a full mesh network as discussed in the following wiki article: https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server

Hope that helps.
 
CEPH in Proxmox is the endgame, but as they said,if you really don't need that type of HA(or even real HA), you could go with storage replication. It works good, manual failover is fairly quick and with backups you're good.
 
To chime in further, 3 node Ceph clusters are inherently fragile.
...and yet much more stable than any other solution without proper built-in HA like ZFS replication or even NFS or any other SPOF-based setup like OP's. Sure, 5 node cluster is better than 3, but if you're coming from a two node cluster VMware, you're not going to have a 5 node ceph cluster, most don't even want a 3 node ceph cluster, because it's 50% more expensive than before without giving any advantage over the two node cluster from an end user view. The third node often costs more than just licensing VMware for two nodes, so it's a no-brainer.
 
  • Like
Reactions: Johannes S
...and yet much more stable than any other solution without proper built-in HA like ZFS replication or even NFS or any other SPOF-based setup like OP's. Sure, 5 node cluster is better than 3, but if you're coming from a two node cluster VMware, you're not going to have a 5 node ceph cluster, most don't even want a 3 node ceph cluster, because it's 50% more expensive than before without giving any advantage over the two node cluster from an end user view. The third node often costs more than just licensing VMware for two nodes, so it's a no-brainer.
Fair enough, definitely better than a single NFS share (SPOF). If I was OP, I would do Ceph, no brainer. However that being said, as resilient as Ceph is, a 3 node cluster is definitely the least resilient possible Ceph configuration. The forum post that was linked previously does a great job of explaining the issues, so I won't beat the dead horse. My recommendation is just to consider all possibilities and have all information available before being blindsided later down the road.

ZFS replication is the easiest, most set and forget configuration for a 2 node cluster. It definitely becomes wasteful and less attractive for a 3 node cluster so that's why I proposed the possibility of a 2 node cluster here.

We could probably go back and forth all day on this, but ultimately, Ceph is awesome, 3 node Ceph is a bit of a nightmare if you're not prepared, even four node is better like the linked post says. If OP is ambitious and ready to deal with some complexity and fragility, can go for 3 node Ceph and they will probably love it.

Cheers