Cluster and Backup setup questions

Dunuin · Mar 17, 2022

Hi,

Right now my Homelab looks like this:

Server A:
- bare metal Debian with PVE 7.1 ontop
- runs 24/7
- local ZFS storage for VM/LXC disks backed up to PBS on server B
- I backup the system disks to my PBS VM on server B by booting into a Debian USB stick that then uses the proxmox-backup-client to do blocklevel backups of the system two disks
- many VMs need to access SMB/LXC shares of server 2
- got most of my RAM and CPU power (64GB RAM + 32 threads)
- running all my guests (21 VMs + 5 LXCs) except for my backup Pi-Hole VM, backup OPNsense VM, my PBS 2.1 VM
- using 4x Gbit LACP bond for LAN/DMZs + 1x 10Gbit for storage backend + 1x Gbit directly connected to Server B for low latency (pfsync)

Server B:
- bare metal TrueNAS Core 12.0
- runs 24/7
- centralized storage for all hosts. So every host requires to access the SMB/NFS shares
- only 32GB RAM + 8 threads so can't run much guests
- running my backup OPNsense VM, my backup Pi-Hole and my PBS 2.1 VM. PBS backups are locally stored via NFS
- using 1x Gbit for LAN/DMZs + 1x 10Gbit for storage backend + 1x Gbit directly connected to Server A for low latency (pfsync)

Server C:
- bare metal TrueNAS Core 12.0
- runs just some hours once per week to recieve ZFS replication from server B and shuts down afterwards
- only 16GB RAM + 8 threads (same mainboard/HDDs/CPU as server B)
- doesn't run any guests
- using just 1x Gbit for LAN/DMZs (would like to add a 10Gbit NIC too but the idea first was to use a 300Mbit Wifi bridge and put in the basement as some sort of pseudo offsite backup where I couldn't use any cables)

TrueNAS is fine if you just want to serve some SMB/NFS shares but it really sucks when working with virtualization, complicated network layouts, monitoring, individualization and so on. I really would prefer to run PVE bare metal on server B too (maybe even on all three servers) and then just virtualize TrueNAS in a VM. I already got the two HBAs so PCI passthrough hopefully isn't a problem (not checked IOMMU groups yet...its a Supermicro X10SSL-F).
Whats about creating a cluster? I don't need HA and I don't want to use a shared storage (CEPH would be nice but I can't get a third server thats runs 24/7 because of the electricity costs). Would just be nice to be able to offline migrate guests between nodes and to manage both nodes through the same webUI. Or does a migration still requires a shared storage? I don't need live migration, would be fine when the guest is offline while migrating.
How it is about the quorum? As far as I understand I would need to setup a qdevice as a third voter, which isn't a problem, as I got some spare raspberry pis. But what happens if I also install PVE on server C which is offline most of the time? Will that be fine or is it problematic because I then sometimes got a even number of voters?

And how to best backup the servers using PBS? Would it be enough to just run one PBS VM on server B with the datastore on NFS on server B too? The datastore is replicated once per week to server C and I would create a weekly Vzdump backup of the PBS VM that could be replicated to server C too. In case server A would fail I would got PBS and datastore on server B so I could restore my PVE system disks and guests. In case server B would fail I would loose the PBS VM and the datastore but I could start server C, which got the week old backups of what server B has lost and share it as read-only using NFS/SMB shares. So server A could restore the vzdump backup of the PBS VM and then I just would need to edit the fstab of the restored PBS VM and change the SMB share that stores the datastore from server B to server C? Or can't PBS do a restore when the dataset is read-only? Because that way I would get a PBS VM on server A and a datastore on server C so the PVE system disks and guests of server B coud be restored.

Another option would be to have a second PBS VM on server A that uses the replicated read-only datastore on server C. That PBS VM then would always be stopped to not waste RAM/CPU and I would only start it (and server C too so the copy of the datastore is availabe) when server B needs to be restored. But would that work at all?

And running two PBS VMs with syncronization would maybe also be an option.
I first thought about running the main PBS VM on server B with datastore on server B. And then exclude the datastore from replication so it doesn't gets backed up from server B to server C. Then I could run a PBS VM on server C with with the datastore also stored on server C but then I would get two problems. First can server C barely handle TrueNAS with its low RAM so there aren'T the ressources to run guests on it and secondly I would then again need to run VMs using TrueNAS which I don't like. Maybe it would be an option to run the second PBS VM on server A with just the datastore on a NFS share on server C and then setup a sync job in PBS so it pulls backups from the PBS VM on server B once a week?
In that case I would just need to start server C when server B fails so I could restore server B from the PBS on server A. Is it problematic when the PBS VM on server A would be running while the datastore on server C wound't be accessible because its shutdown most of the time? OR should that be fine as long as the NFS share with the datastore is online again when the GC/prune/sync jobs should start?

UdoB · Mar 18, 2022

Good morning,
you describe a complex situation. I can just add my personal approach for comparison.

Dunuin said:
And how to best backup the servers using PBS? Would it be enough to just run one PBS VM on server B with the datastore on NFS on server B too?

I do run PBS in a VM on a Synology - using NFS to access the Syno-Storage from that VM. This is definitely not a recommended contruct, but "it works!". My point is to always have backup on independent hardware. In case of need the low performance is acceptable for me in my homelab. (Currently I have three PBS on separate Hardware: that Synology thing, an Odroid SBC with problematic USB Disks and a HP Microserver. This evolution shows also how my journey went...) And not all Hardware-PBS need to run 24*7, one of theses PBS turns itself on once a week (by BIOS) to create a weekly backup - and shuts down simply by cron later on.

Whats about creating a cluster? I don't need HA and I don't want to use a shared storage

For me a cluster is important. I want to be able to easily move VMs around. And I wanted a fallback-server in case of maintenance or hardware trouble. HA is a different story (for me). I've played with it and it works. But I actually do not use it. The parts I need to be available permanently are instantiated redundant in the classic way, e.g. primary/secondary DNS.
I have three "fat" machines for VMs. The oldest one is physically powered off - purely because of the crazy electricity bill. I am turning it on for updates and in advance of planned maintenance of one of the main machines. This works very well for me. I just run "pvecm expected x" manually to let the cluster know this.
With ZFS as the base shared storage is not really necessary. I do daily replication for some (not all!) VMs across my cluster. Again: works for me.
Regarding Quorum: if you keep TrueNAS on separate hardware I would just run a small VM there to deliver a third vote...

Best regards

Search

Search

Cluster and Backup setup questions

Dunuin

Distinguished Member

UdoB

Distinguished Member

We value your privacy