Most efficient sync / prune / garbage collect strategy

sukerman · Oct 13, 2020

Hola,

I have a production server. I do nightly backups which take about 6 hours. I want local backups to restore as fast as possible so I have proxmox backup installed locally. In case the whole server is lost, I remote sync the backups to another proxmox backup server remotely. Backup data for both backup servers are on spinning drive raid arrays.

q1) Should the production backup server avoid doing any prunes / garbage collect when I know the backup is likely still running? or is it / will it be someday automatically paused?

q2) Should the remote server avoid remote syncing with the production server whilst it is backing up / pruning / gc? Should the remote sync use 'delete vanished', or allow the copy then do local prune/gc. In that case should I avoid any kind of prune / gc during the remote sync?

q3) Should I avoid setting prune / gc, to hourly otherwise potentially they are running at the same time?

q4) Does 'delete vanish' actually delete anything? or does it require a prune, gc, or both?

Gut feeling says getting any hdd to do two things at once is probably not the best idea but any advice please on the least resource intensive / fastest method please?

Cheers,

Jack

Stefan_R · Oct 15, 2020

sukerman said:
q1) Should the production backup server avoid doing any prunes / garbage collect when I know the backup is likely still running? or is it / will it be someday automatically paused?

In the currently public version (0.9.1) there is a bug where prune and GC started at the same time may lead to a failed GC, this will be fixed in the next release. Doing prune/GC/verify while a backup is happening is perfectly fine (concurrency is one of the strong points of PBS, nothing should interfere with anything else if at all avoidable).

In general though, the right order of operations would be to run prune first, and then (schedule an hour later or so) do GC. This way GC can collect any chunks no longer necessary after the prune. If you want, you can also schedule a verify later again, though keep in mind that verify scheduling will be improved in coming versions too.

sukerman said:
q2) Should the remote server avoid remote syncing with the production server whilst it is backing up / pruning / gc? Should the remote sync use 'delete vanished', or allow the copy then do local prune/gc. In that case should I avoid any kind of prune / gc during the remote sync?

Again, not withstanding bugs, all operations should be safe to do concurrently.

Using 'delete vanished' removes any snapshots on the sync target that are no longer existing on the source, which I'd recommend if you want the same 'prune' schedule on both nodes. Alternatively, leave it off and set up a different 'prune' schedule on the target - that way you can for example keep older snapshots alive on your secondary, without wasting space on your production machine.

For scheduling, it probably makes sense to do after 'prune', so it doesn't need to do any unnecessary syncs.

sukerman said:
q3) Should I avoid setting prune / gc, to hourly otherwise potentially they are running at the same time?

Prune/GC every hour is not necessary. I'd recommend scheduling them once per day, with about an hour in between (depending on your average runtimes). See my answer to q1.

sukerman said:
q4) Does 'delete vanish' actually delete anything? or does it require a prune, gc, or both?

it behaves the same as doing 'forget'. Thus, it deletes the snapshot metadata, not the actual chunks - you still need to schedule a GC on the target node for that.

In general, also measure your disk utilization (using 'atop' or similar) and see for yourself how your system scales.

sukerman · Oct 15, 2020

Thanks a lot it means a great deal to get quality support and replies when you're relying on something for a backup, I'm glad I switched from restic to the pbs, which of course makes sense if you use proxmox and I'm glad it's good.

Good work! thanks again.

RobFantini · Oct 18, 2020

I had assumed that a remote sync target did not need GC , since it should be a duplicate of the main pbs system. however that is not the case.

the remote had 2TB+ more disk usage after a couple months of syncs. running gc fixed that.

question: once pbs is stable, should gc still be needed at the remote?

Stefan_R · Oct 19, 2020

RobFantini said:
question: once pbs is stable, should gc still be needed at the remote?

Yes, that is how it's intended to work. The chunk store and the snapshot metadata are treated as two seperate entities. This is what allows deduplication even between backups of different hosts/VMs/CTs. The only connection between the two is writing chunks on backup, and removing them on GC - and since "sync" works on the snapshot level, it cannot remove chunks that are no longer needed. Running GC on target nodes is thus recommended, if you run with "remove vanished" enabled.

RobFantini · Oct 19, 2020

Stefan_R said:
Yes, that is how it's intended to work. The chunk store and the snapshot metadata are treated as two seperate entities. This is what allows deduplication even between backups of different hosts/VMs/CTs. The only connection between the two is writing chunks on backup, and removing them on GC - and since "sync" works on the snapshot level, it cannot remove chunks that are no longer needed. Running GC on target nodes is thus recommended, if you run with "remove vanished" enabled.

that makes sense. so we will always have gc set up at the remotes.

Search

Search

Most efficient sync / prune / garbage collect strategy

sukerman

Well-Known Member

Stefan_R

Proxmox Retired Staff

sukerman

Well-Known Member

RobFantini

Famous Member

Stefan_R

Proxmox Retired Staff

RobFantini

Famous Member

We value your privacy