Optimization possible?

richieman · Jul 28, 2022

Hello!
I've been using PBS since about a year now, love it!

I know it is recommended to run the server with SSD data disks. Unfortunately I cannot afford it and have to use spinning disks. Backup and restore works masterfully and perform well, but verification and sync seems to be too slow. At this rate my backup server can't scale a lot more. I've been wondering for a while now if certain optimizations could be possible that would make it perform a lot better.

When verifying multiple backups within a single set, it will verify the entire backup. Instead it could verify just the snapshot differences compared to another backup that is already verified. That way only the difference have to be verified. For Sync a similar optimization may be possible by just sending the snapshot difference.

Perhaps this method would be slightly less reliable but in my case I'd choose for performance even if this would cost some reliability

Perhaps a checkbox "only verify/sync snapshot difference" could be implemented?

Thanks and have a good day!

Richard

Dunuin · Jul 28, 2022

I think one problem might be that PBS isn't doing any differential backups. All backup snapshots are full backups so it will read/verify every chunk.
I think in theory this could work, but for that PBS would need to compare the backup snapshots first and find out whats the difference between the backup snapshots and keep some lists of what chunks have been already verified. I guess thats alot of extra work as PBS isn`t keeing a database storing chunk information.

richieman · Jul 28, 2022

I was looking more into it and found some very interesting information here and here.

As it turns out PBS does incremental backups in 2 ways depending on the situation. One using 'dirty bitmaps' and the other using the SHA-256 checksums of each chunk (the chunks are stored with the checksum as the filename). So as far as backups go, things are already pretty optimal.

PBS does keep the chunk information in the 'drive-scsi0.img.fidx' file so it should be fairly easy to work out what the difference is and only verify/sync the difference.

RolandK · Jan 24, 2024

yes, verify is very inefficient in it's current implementation, as it won't keep a record on what chunks already have been verified recently between verify job executions.

so "skip verified" and "re-verify-after" does not help very much across jobs. and it's misleading, too, as from a user perspective you would expect, that this persists across jobs and operates at chunk level and not at snapshot levell.

when your backup data growth is low and you have a verify job running every day, even with these skip and re-verify settings, most chunks get re-verified every day (as all chunks which belong to the latest snapshots getting verified again, and that's not much different to the chunks belonging to the former snapshots, if there is not too much change in your backup)

so, you are effectivley not veriying a backup diff at every verify job, but effectively you verify the whole, current backup dataset, i.e. the size/volume of all VMs disks which have been backed up recently.

that's why verify lasts so long.

this is wasting precious disk io ressources and needlessly burns much of cpu. a waste of energy.

this needs improvement.

i have some ideas, what could be done.

https://bugzilla.proxmox.com/show_bug.cgi?id=5035

besides this proof of concept, i think there should be introduced some mechanism to timestamp chunk verification somehow and take that into consideration on re-verify.

Search

Search

Optimization possible?

richieman

Member

Dunuin

Distinguished Member

richieman

Member

RolandK

Famous Member

We value your privacy