Incremental backup with multiple datastores

Pay4Property · Aug 19, 2023

I'm moving our environment from Hyper-V to Proxmox and so far I like what I am seeing.

I've setup Proxmox Backup Server with a number of datastores linked iscsi over 1Gb ethernet to 3 NAS devices and can successfully backup to each datastore. I understand that I would get better performance backing up to a local SSD but I get good enough speed to the NAS's and they meet my requirements having the backups stored physically distant from the host.

As PBS stores incremental backups I need to understand how it does this. Presumably, the first time a VM is backed up a full backup is stored and from then on just the changes, possibly doing another full backup periodically or effectively re-creating one when older backups are pruned.

My key question is whether this done per datastore or per PBS server?

For example if I backup VM100 to datastore A, then later do another backup of VM100 to datastore A, presumably it stores a full then incremental backup on datastore A. If I subsequently backup VM100 to datastore B, is this an increment also referencing the full of datastore A or is it a full backup in its own right that could be successfully restored even if datastore A is unavailable?

My primary concern is that in the event of total loss of a host and 1 of the datastores I want to be able to install a new host, install PBS, link it to one of the remaining datastores and restore everything. That's why I need to understand whether the loss of 1 datastore would prevent successful restore as that defeats my objective of having remote backup storage in multiple distant locations for redundancy.

Thanks for any advice.

Dunuin · Aug 19, 2023

Pay4Property said:
I've setup Proxmox Backup Server with a number of datastores linked iscsi over 1Gb ethernet to 3 NAS devices and can successfully backup to each datastore. I understand that I would get better performance backing up to a local SSD but I get good enough speed to the NAS's and they meet my requirements having the backups stored physically distant from the host.

Biggest problems are not backup or restore performance but the maintainance tasks like a full re-verify or a GC once you got some TBs of backups. To test if it is fast enough, fill up your datastores to a good amount (but keep in mind you can only delete stuff 24 hours later) and then do a GC and a full re-verify and see if it still works for you.

Pay4Property said:
As PBS stores incremental backups I need to understand how it does this. Presumably, the first time a VM is backed up a full backup is stored and from then on just the changes, possibly doing another full backup periodically or effectively re-creating one when older backups are pruned.

All backups are full backups. But it can be a full+incremental backup at the same time because of how deduplication works. PBS isn't using differential backups like other backup solutions often do it.
Basically PVE will split all your data in chunks of up to 4MB, hash, compress and optionally encrypt them. Because of deduplication none of those chunks will be stored more than once per datastore. Each backup is referencing all the chunks so it is a full backup but the same chunk can be used by multiple backup snapshots. So it is incremental, as no chunk needs to be uploaded to the PBS that already exists there.
Another thing is dirty bitmapping, where PVE will keep track of what blocks of virtual disks changed, so it can skip the block that were unchanged since the last backup. But this dirty bitmap will be dropped once you restart the server or the VM so then the whole virtual disk will have to be read + hashed again.

Pay4Property said:
My key question is whether this done per datastore or per PBS server?

Deduplication is done per datastore. If you want a better deduplication rate you could work with a single datastore but multiple namespaces instead of using multiple datastores.

Pay4Property said:
For example if I backup VM100 to datastore A, then later do another backup of VM100 to datastore A, presumably it stores a full then incremental backup on datastore A. If I subsequently backup VM100 to datastore B, is this an increment also referencing the full of datastore A or is it a full backup in its own right that could be successfully restored even if datastore A is unavailable?

Both datastores will then need to store a copy of the whole data of that VM.

Pay4Property said:
My primary concern is that in the event of total loss of a host and 1 of the datastores I want to be able to install a new host, install PBS, link it to one of the remaining datastores and restore everything. That's why I need to understand whether the loss of 1 datastore would prevent successful restore as that defeats my objective of having remote backup storage in multiple distant locations for redundancy.

PBS needs IOPS performance which will be terrible if you access the storage over the internet because of the additional latency. You are doing millions over millions of small random IO when doing stuff like a GC. If you want an offsite backup you should have a second PBS offsite with local storage. You can then use the sync task to tell the offsite PBS to pull the latest backup snapshots from the local PBS. this could also give you some ransomware protection and in case your local PBS dies you could directly restore from the remote PBS.

Pay4Property · Aug 19, 2023

Thanks for your reply.

For maintenance tasks, hopefully using NAS storage won't be a problem as there shouldn't be vast amounts of data. The VMs are a couple of webservers a database server and a PBX and the data changes/grows pretty slowly. The Hyper-V backup only took around 1.5Tb for all backups and that was without deduplication. I'm also reducing my backup retention so I would expect the size of each PBS datastore to be under 1Tb.

I understand that for truly remote backup over the internet I should have a separate PBS server at the other end. I may do that in the future but currently the NAS's are simply in different buildings on the same site LAN so latency isn't an issue.

From what you say about each datastore effectively being independent, that's exactly what I need so I can do a full restore from any 1 of the datastores remaining after a "disaster".

Thanks
Chris

Pay4Property · Sep 11, 2023

I thought I'd just post a quick update now Proxmox VE has been in use for 2 months and PBS has been in use for a month or so.

It is working incredibly well which definitely supports the case for moving away from Hyper-V. Performance wise there seems very little difference between Hyper-V and Proxmox VE though none of our VMs are running particularly demanding tasks. In terms of management though I feel much more comfortable with Proxmox VE even though I've been a "windows boy" for 20+ years. Hyper-V is a bit of a "black box" so if it works it's fine but when you have anything more than a simple issue it's very difficult to try to find out what's causing the problem. With Proxmox VE I've found I have a more solid hypervisor interface that has already helped diagnose issues and when a VM goes crazy it doesn't take down the whole system.

A great example is that we've had a problem with the VMs on Hyper-V suddenly becoming unavailable at which point the Hyper-V remote management interface also stopped working so I couldn't connect to get any useful information while the problem was actually active and the only way to resolve the issue was to hard reset the hardware. After moving the VMs across to Hyper-V they all worked fine for several weeks until last week when one stopped working and the others were running a bit slower. Fortunately the Proxmox VE management UI, wasn't affected unlike Hyper-V so I was able to remotely connect (from 5000 miles away sat on the beach!) and see what was going on. Turns out that the affected VM was showing 100% CPU which on Hyper-V obviously killed the whole system but Proxmox VE coped admirably allowing me to reset the VM and then look at what caused the issue as I knew from the CPU stats when the problem started. A problem that had occured many times and had frustrated me for months on Hyper-V was solved in an hour the first time it happened on Proxmox VE.

In respect of PBS I'm also hugely impressed. Thanks to Dunuin's guidance I now have reliable backup to 2 physically separate datastores on NAS devices, just as I has with Hyper-V using Nakivo Backup, BUT it's much better for 2 reasons. Firstly our daily backup now takes just 7 minutes every day rather than around 10 minutes for daily incrementals and 1 hour for the weekly full. That's only a small benefit though compared to the second reason PBS is better which is the vast saving in space. Compared to some other backup systems for Hyper-V Nakivo was relatively efficient with space but compared to PBS it's completely bloated! As the data on the servers I've currently moved to Proxmox doesn't change a great deal daily we're achieving a de-duplication factor of 23.53 and a months worth of backups is taking around 20% of the space it took for the same servers on Hyper-V / Nakivo.

Despite using remote NAS storage the PBS datastore maintenance jobs (prune / verify) only take around 30 minutes once a week and have little / no impact on performance as they are run outside hours so although such storage isn't recommended it's perfectly acceptable for what I need.

I'll shortly be porting our final VM away from Hyper-V onto Proxmox VE and the data on that changes a bit more so we'll see what impact that has on backup times and maintenance jobs but from what I've seen so far I have every confidence that it will continue to perform well above expectations.

The Proxmox VE and PBS teams should be very proud of what they have achieved.

Search

Search

Incremental backup with multiple datastores

Pay4Property

New Member

Dunuin

Distinguished Member

Pay4Property

New Member

Pay4Property

New Member

We value your privacy