Proxmox Cluster Improvements Ideas / Questions for small business

Jul 24, 2024
1
1
3
We moved the hardware we had on-hand to Proxmox a year ago and now looking ahead to what's next (Fine tuning, improve reliability, and best practices).

This is for a small business of around 10 users with a small amount of data (1.5 TB). The data is mostly spreadsheets, documents, and Access databases. The amount of data grows slowly over time.

Requirements for the set up:
  • Local / No cloud based storage
  • Run around 6 VMs (Active Directory VM, TrueNAS VM, VM to host and run docker apps on the LAN, and some other app VMs)
    • The VMs all run well on a single Proxmox node with a single Intel Xeon processor with a https://www.cpubenchmark.net/ multithread rating around 8000 and single thread rating around 2000.
    • VMs run well on a pair of enterprise Mixed-Use SSDs (Kingston DC600M).
  • Minimal to no down time is a high concern (Company has timely operations).
  • Server offsite in case something happened to the main office that can be switched to very fast.
Our current setup:
  • 3 Node Proxmox Cluster
    • Primary Node (Currently 2 Years Old)
      • All the VMs currently run on this node.
      • Runs Proxmox Backup Server as a VM.
        • Stores around 3 TB worth of VM backups.
        • Backs up all VMs nightly locally.
        • Syncs all the backups to a plugged in 4 TB consumer external SSD every night.
          • External SSD is unplugged and rotated offsite each Friday. 5 external SSD total rotation (4 in an offsite location and 1 onsite plugged in).
          • External SSD filesystems use ZFS.
        • The PBS VM has ZFS as the filesystem.
      • Host OS uses ZFS (2 x 8 TB Enterprise SATA drives mirrored).
      • Non-ECC Memory
    • Secondary Node (Currently 5 Years Old)
      • Replication Target for the VMs (Except the PBS) every 15 minutes.
      • Host OS uses ZFS
        • 8 x 2TB Consumer SATA SSDs in a RAID 10 equivalent config (Total 8 TB storage).
      • ECC-Memory
      • In a different building down the road. Has clear line of sight so local network is extended to it using point to point WiFi getting around 300 Mb/s.
      • Doesn't do much else (Test VMs sometimes).
    • Tertiary Node (Currently 8 Years Old)
      • Replication Target for the VMs (Except the PBS) every 15 minutes.
      • Host OS uses ZFS.
        • 4 x 2 TB Enterprise HDD in a RAID 10 equivalent config (Total 4 TB storage).
      • ECC-Memory
      • Onsite next to the Primary Node.
      • Doesn't do much else.
The Primary Node gets replaced every 3 years then old primary → secondary and old secondary → tertiary.

Potential improvements ideas:
  • Proxmox Backup Server VM
    • Switch guest OS filesystem locally to ext4 or XFS (Get rid of write amplification)
    • Switch the filesystem for external drives to ext4 or XFS instead of ZFS.
    • Move the PBS VM to a different node so a backup job doesn't affect the other VMs IO Delay.
    • Replicate the PBS VM to the other nodes (Most likely nightly).
  • Primary Node
    • Eventually get rid of or repurpose this node to replace with a node that supports ECC memory.
  • Secondary Node
    • Replace the consumer SSDs drives with 2 x 8 TB Enterprise SSDs.
  • Tertiary Node
    • Replace the enterprise HDD drives with 2 x 8 TB Enterprise SSDs.
  • Turn on High Availability to have it automatically switch over in case the Primary Node fails.
Right now, there's no PBS on dedicated hardware planned. However, I know that is the more recommended practice and might try to work it in.

Questions:
  • Any suggestions on how to better handle the company requirements with Proxmox and Proxmox Backup Server?
  • Any red flags with our current setup and the potential improvement ideas?
  • Does switching PBS VM filesystem from ZFS to ext4 or XFS make sense? Recommend ext4 or XFS?
  • I'm not sure how much is gained / lost switching the external SSDs for PBS backups from ZFS?
    • Looking around it sounds like ZFS on a single disk gives more detection (Over Ext4 / XFS) for data corruption but has more overhead. However, PBS doing verify jobs would spot data corruption anyways right?
  • Are there potential downsides to turning on High Availability with this setup?
  • We currently use a TrueNAS VM (The bulk of the data) for file shares. The only features we really use is SMB shares, share / file permissions on Active Directory info from our Active Directory VM, storing files, and taking ZFS snapshots inside TrueNAS (To allow users to use the Previous Version tab in Windows on files). TrueNAS most likely is way overkill for that… any suggestions on alternative setups?
I have some other setup ideas too for the entire PVE cluster + PBS but would like to hear your thoughts first.

Thank you!
 
Last edited:
  • Like
Reactions: meichthys

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!