Cluster and Backup setup questions

Dunuin

Distinguished Member
Jun 30, 2020
14,311
4,198
243
Germany
Hi,

Right now my Homelab looks like this:

Server A:
- bare metal Debian with PVE 7.1 ontop
- runs 24/7
- local ZFS storage for VM/LXC disks backed up to PBS on server B
- I backup the system disks to my PBS VM on server B by booting into a Debian USB stick that then uses the proxmox-backup-client to do blocklevel backups of the system two disks
- many VMs need to access SMB/LXC shares of server 2
- got most of my RAM and CPU power (64GB RAM + 32 threads)
- running all my guests (21 VMs + 5 LXCs) except for my backup Pi-Hole VM, backup OPNsense VM, my PBS 2.1 VM
- using 4x Gbit LACP bond for LAN/DMZs + 1x 10Gbit for storage backend + 1x Gbit directly connected to Server B for low latency (pfsync)

Server B:
- bare metal TrueNAS Core 12.0
- runs 24/7
- centralized storage for all hosts. So every host requires to access the SMB/NFS shares
- only 32GB RAM + 8 threads so can't run much guests
- running my backup OPNsense VM, my backup Pi-Hole and my PBS 2.1 VM. PBS backups are locally stored via NFS
- using 1x Gbit for LAN/DMZs + 1x 10Gbit for storage backend + 1x Gbit directly connected to Server A for low latency (pfsync)

Server C:
- bare metal TrueNAS Core 12.0
- runs just some hours once per week to recieve ZFS replication from server B and shuts down afterwards
- only 16GB RAM + 8 threads (same mainboard/HDDs/CPU as server B)
- doesn't run any guests
- using just 1x Gbit for LAN/DMZs (would like to add a 10Gbit NIC too but the idea first was to use a 300Mbit Wifi bridge and put in the basement as some sort of pseudo offsite backup where I couldn't use any cables)


TrueNAS is fine if you just want to serve some SMB/NFS shares but it really sucks when working with virtualization, complicated network layouts, monitoring, individualization and so on. I really would prefer to run PVE bare metal on server B too (maybe even on all three servers) and then just virtualize TrueNAS in a VM. I already got the two HBAs so PCI passthrough hopefully isn't a problem (not checked IOMMU groups yet...its a Supermicro X10SSL-F).
Whats about creating a cluster? I don't need HA and I don't want to use a shared storage (CEPH would be nice but I can't get a third server thats runs 24/7 because of the electricity costs). Would just be nice to be able to offline migrate guests between nodes and to manage both nodes through the same webUI. Or does a migration still requires a shared storage? I don't need live migration, would be fine when the guest is offline while migrating.
How it is about the quorum? As far as I understand I would need to setup a qdevice as a third voter, which isn't a problem, as I got some spare raspberry pis. But what happens if I also install PVE on server C which is offline most of the time? Will that be fine or is it problematic because I then sometimes got a even number of voters?

And how to best backup the servers using PBS? Would it be enough to just run one PBS VM on server B with the datastore on NFS on server B too? The datastore is replicated once per week to server C and I would create a weekly Vzdump backup of the PBS VM that could be replicated to server C too. In case server A would fail I would got PBS and datastore on server B so I could restore my PVE system disks and guests. In case server B would fail I would loose the PBS VM and the datastore but I could start server C, which got the week old backups of what server B has lost and share it as read-only using NFS/SMB shares. So server A could restore the vzdump backup of the PBS VM and then I just would need to edit the fstab of the restored PBS VM and change the SMB share that stores the datastore from server B to server C? Or can't PBS do a restore when the dataset is read-only? Because that way I would get a PBS VM on server A and a datastore on server C so the PVE system disks and guests of server B coud be restored.

Another option would be to have a second PBS VM on server A that uses the replicated read-only datastore on server C. That PBS VM then would always be stopped to not waste RAM/CPU and I would only start it (and server C too so the copy of the datastore is availabe) when server B needs to be restored. But would that work at all?

And running two PBS VMs with syncronization would maybe also be an option.
I first thought about running the main PBS VM on server B with datastore on server B. And then exclude the datastore from replication so it doesn't gets backed up from server B to server C. Then I could run a PBS VM on server C with with the datastore also stored on server C but then I would get two problems. First can server C barely handle TrueNAS with its low RAM so there aren'T the ressources to run guests on it and secondly I would then again need to run VMs using TrueNAS which I don't like. Maybe it would be an option to run the second PBS VM on server A with just the datastore on a NFS share on server C and then setup a sync job in PBS so it pulls backups from the PBS VM on server B once a week?
In that case I would just need to start server C when server B fails so I could restore server B from the PBS on server A. Is it problematic when the PBS VM on server A would be running while the datastore on server C wound't be accessible because its shutdown most of the time? OR should that be fine as long as the NFS share with the datastore is online again when the GC/prune/sync jobs should start?
 
Good morning,
you describe a complex situation. I can just add my personal approach for comparison.
And how to best backup the servers using PBS? Would it be enough to just run one PBS VM on server B with the datastore on NFS on server B too?
I do run PBS in a VM on a Synology - using NFS to access the Syno-Storage from that VM. This is definitely not a recommended contruct, but "it works!". My point is to always have backup on independent hardware. In case of need the low performance is acceptable for me in my homelab. (Currently I have three PBS on separate Hardware: that Synology thing, an Odroid SBC with problematic USB Disks and a HP Microserver. This evolution shows also how my journey went...) And not all Hardware-PBS need to run 24*7, one of theses PBS turns itself on once a week (by BIOS) to create a weekly backup - and shuts down simply by cron later on.
Whats about creating a cluster? I don't need HA and I don't want to use a shared storage
For me a cluster is important. I want to be able to easily move VMs around. And I wanted a fallback-server in case of maintenance or hardware trouble. HA is a different story (for me). I've played with it and it works. But I actually do not use it. The parts I need to be available permanently are instantiated redundant in the classic way, e.g. primary/secondary DNS.
I have three "fat" machines for VMs. The oldest one is physically powered off - purely because of the crazy electricity bill. I am turning it on for updates and in advance of planned maintenance of one of the main machines. This works very well for me. I just run "pvecm expected x" manually to let the cluster know this.
With ZFS as the base shared storage is not really necessary. I do daily replication for some (not all!) VMs across my cluster. Again: works for me.
Regarding Quorum: if you keep TrueNAS on separate hardware I would just run a small VM there to deliver a third vote...

Best regards
 
  • Like
Reactions: Dunuin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!