I am setting up a tiny cluster - I only need more than one machine to provide fault tolerance (will be using 3 for quorum / simpler upgrade and maintenance cycles). Given that I have 3 physical hosts which is very redundant, I'm not planning on mutliple PSU's/NICs/RAID. To provide resillience against a node failing completely I want the containers/VM disks (I'm expecting this to be a mixture) replicated or backed up across all the devices. Although I want low RTO, I can accomodate quite high RPO (24 hours +).
I am seeking very high uptime in the absence of node failures. Indeed some of the services (HTTP, SMTP) will be replicated across instances pinned to specific physical nodes, however at least in the case of a NFS server (it's using separate iSCSI storage) implementing this on concurrent hosts is rather esoteric. Hence the backup/replication must be minimally disruptive.
While Proxmox provides replication and snapshots, this appears to be dependent on ZFS or Ceph for storage. From my research, both appear to have major performance overheads at such a small scale.
1) Is it worth considering using filesystem level snapshots (e.g. BTRFS/LVM), simply remounting the snapshot version/copying out the files and relying on crash recovery for bringing these images back online?
2) Should I just configure local backups in Proxmox and replicate these seperately which a scheduled rsync?
3) Which of CephFS, Ceph RBD and local ZFS is least bad for performance at this scale?
I am seeking very high uptime in the absence of node failures. Indeed some of the services (HTTP, SMTP) will be replicated across instances pinned to specific physical nodes, however at least in the case of a NFS server (it's using separate iSCSI storage) implementing this on concurrent hosts is rather esoteric. Hence the backup/replication must be minimally disruptive.
While Proxmox provides replication and snapshots, this appears to be dependent on ZFS or Ceph for storage. From my research, both appear to have major performance overheads at such a small scale.
1) Is it worth considering using filesystem level snapshots (e.g. BTRFS/LVM), simply remounting the snapshot version/copying out the files and relying on crash recovery for bringing these images back online?
2) Should I just configure local backups in Proxmox and replicate these seperately which a scheduled rsync?
3) Which of CephFS, Ceph RBD and local ZFS is least bad for performance at this scale?