Advice on 3-node Proxmox storage and backup setup (ZFS, RAID10, Backup strategies)

veryuniqueusername

New Member
Jun 26, 2024
1
0
1
Hello everyone,

I'm currently planning the setup of a new 3-node Proxmox cluster and would really appreciate some advice regarding storage configuration and backup strategy.

I have limited experience with Proxmox (coming from an older cluster setup by former colleagues), so I'm looking for some guidance.

Hardware Overview (Identical for all 3 nodes):
  • CPU: Intel i9-14900K (BIOS updated to latest version with microcode patch 0x12F)
  • RAM: 4x 32 GB ECC Unbuffered RAM (128 GB total)
  • OS Drives: 2x Micron 7450 PRO 480 GB M.2-2280
  • VM/Container Drives: 4x 1.92TB Samsung PM893 (SATA SSDs)
  • Networking:
    • Addon NIC: 2x 10 GbE
    • Onboard: 1x 2.5 GbE and 1x 1 GbE
The hardware is already purchased, and there are no current plans to acquire additional hardware, so I'd appreciate advice within those constraints.

The nodes will be interconnected via a 1 GbE switch, which will eventually be upgraded to 10 GbE.
The Synology NAS used for backups only supports 1 GbE.

Existing Setup Context:

We currently run a legacy Proxmox 3-node cluster that is several years old and nearing end-of-life.
Each node consisted of:
  • 1x i7-4790K (4c/8t)
  • 32 GB RAM
  • Local LVM-based storage (on various-sized SSDs and HDDs) with no redundancy
  • Backups to a local Synology NAS via NFS, plus some on-node local backups
It has worked reasonably well over the years, but lacks resilience and modern reliability.

Expected Workload:

The new cluster will be primarily used for development and testing, but will also run a few services with higher reliability expectations.
Expected workloads include:
  • Windows Server 2022/2025 running Active Directory (ideally 3 domain controllers, one per node)
  • A few always-on Windows Server VMs with varying (generally low) loads
  • A few Windows 10/11 VMs for testing (mostly idle or preferably stopped when not used)
  • A few LXC containers (low resource requirements)
  • Windows 10/11 VM running Jenkins (host)
  • A few Windows build machines (VMs) for compiling and automated builds
VMs and containers will be distributed as needed across the cluster.
On average, I expect that each VM will be allocated 50–100 GB of storage.

Proposed Setup for New Cluster:

Initial plan
  • OS: ZFS mirror on the two Micron 7450s
  • VM/Container Storage: 2x PM893 (mirrored, ~2 TB usable)
  • Backup Storage: 2x PM893 (mirrored, ~2 TB usable)
Suggested alternative (by a third party):

Use all 4x PM893 SSDs in a ZFS striped RAID10, both for improved performance and to avoid dividing the drives into separate volumes for storage and backup:
  • OS: ZFS mirror on the two Micron 7450s
  • VM/Container Storage: ZFS striped RAID10 on the 4x PM893 SSDs (~4 TB usable)
This sounds like a good performance boost, but coming from the old cluster, I'm actually not too concerned about performance.
My main focus is getting the storage configuration right, which leads me to the following questions:

Questions:
  1. If I go with the ZFS RAID10 option, how should I approach backups?
    1. Is it possible/safe/viable to back up VMs across the nodes (e.g., Node A --> Node B)?
    2. Is using the Synology NAS (1 GbE, HDD-based) via NFS still a sensible off-node backup target?
  2. Any advice or gotchas when using ZFS RAID10 with SATA SSDs for this kind of workload?
  3. Any suggestions for lightweight high availability? I don't need full HA, but services like Jenkins would benefit from some level of fault tolerance.

I should also note that I have very limited experience with ZFS, so any advice or best practices there would be appreciated too.


Thanks in advance for your help!

PS - yes, "AI" was used to help outline this post :)