Sanity check for Remote backups to central office

QuimaxW · Dec 14, 2023

I'm not new to Linux, or system admin, just Proxmox and Proxmox Backup. Moving our remote offices from ESXi hosts to Proxmox machines, and from having basically no backups to full backups. Cause, well, in 10 years we've been very blessed to not have any data loss without backups going.

We have a central office, with a whopping 30Mbps internet, and our main servers. I'm looking to have a PBS server here for central & off-site backup of the 4 remote locations. Each remote location has a single host running 3 or 4 VMs. Largest data drive is about 500GB. Internet connections are vSat or 10Mbps fibre, ya know, slow & expensive. All sites are connected through site-to-site VPN. Having never done this before and reading the manuals, I'm seeing 2 ways of doing this:

1) Single PBS server at central location. The remote servers backup to this directly. I'm certain this will work, but I have no clue about performance. What happens if a backup doesn't finish? Do backups ever time-out if they are taking too long? Restores would have to go back over the WAN as well. This does seem like the "simplest" method though.
or
2) PBS server at each site, either along side PVE or as a VM/container. Initial backups are local, then the central PBS server would pull the repo "once in a while" for the off-site requirement. I think this adds complexity, but it would keep restores on the local network. Don't know how the WAN usage compares though.

Anyone have input? Done this before? I know there is nothing new under the sun, so there is bound to be someone who has experience with this...

t.lamprecht · Dec 14, 2023

Hi!

QuimaxW said:
Single PBS server at central location. The remote servers backup to this directly. I'm certain this will work, but I have no clue about performance.

This is not optimal, especially for initial backup after freshly starting the VM.
In short, and a bit hand-wavy, on backup start the guest-state gets snapshotted, so any new write operations to the VM disk on parts that didn't get saved already, needs to be queued while the saving of that specific original block is done.
While one needs to be a bit unlucky to run into situations where one gets an unlucky IO pattern that just trails the backup job, and those buffer for such "not yet saved" areas gets full, the higher the latency and lower the bandwidth between the Proxmox VE host where the VM is running and the Proxmox Backup Server, the easier one runs into this.

The effects of running into this is IO wait inside the VM, and if IO-threads for the disks is NOT enabled, then the VM might even stall a bit more (IO-threads is enabled by default for all VMs created through the web UI).

Now, I also said that this might be worse on first backup after fresh start as afterwards we can set up a dirty-bitmap for running VMs so that the next backup can know what blocks actually changed. Therefore, what it needs to back up without hitting the storage for anything else. But still, those areas that changed are likely to change more frequently, so it might still cause some of the effects if the connection is terrible.

How much this affects you dependents heavily on the write-loads in the VMs, for containers it's generally less of an issue as there PVE/PBS can go other routes to read the data to back up.
I think if you try to run a backup of the VM with the most frequent writes you might see how well it fares in your situation.

QuimaxW said:
2) PBS server at each site, either along side PVE or as a VM/container. Initial backups are local, then the central PBS server would pull the repo "once in a while" for the off-site requirement. I think this adds complexity, but it would keep restores on the local network. Don't know how the WAN usage compares though.

IMO this might be better in the long run, you get fast local backups with minimal interference for any virtual guest and can then sync those backups asynchronous. As with PBS's content addressable storage only deltas are efficiently synced the "once in a while" could be relatively often.

And while yes, initial setup complexity would be a bit higher, and you would need to handle upgrades for a few instances more, it's IMO not that much extra overhead, as if one instance upgrade works fine in your environment, the others are likely too.
Also, the benefits of having local fast backups is already quite nice and worth it when making them, but it's even much nicer when actually having to restore one.
Because not having to waiting a few extra hours for some critical VM to be up-and-running again by pulling 500 GB (that is still a few hundred GB when compressed) through a 10 Mbps line will be a real life safer IMO, as often such failures happen in the worst possible time.
W.r.t. installing side by side, while we in general recommend having a dedicated backup server, for sites with only a small amount of hosts it can be totally fine to share a host for PBS and PVE. W.r.t. if bare-metal, VM or CT is better is mostly depending on how you would service these, having them in a VM or even CT would decouple the version lock-step, i.e., if PVE and PBS is installed on the same host, without VM/CT separation, then you'd need to sync on major upgrades – not a biggie, especially as PBS is quite easy to take care of, but still something to keep in mind.

Search

Search

Sanity check for Remote backups to central office

QuimaxW

New Member

t.lamprecht

Proxmox Staff Member