Garbage Collection for big datasets takes forever to complete

alexander@cloud · Jul 6, 2022

Hello!

We have quite large datasets and VM Backups (~16 TB for the biggest VM for example) and do hourly snapshots, the deltas are not that large, but the complete VM has a few 16 TB).

The Problem is now, that garbage collections are taking forever ....

In general the performance of the whole backup server is very slow ...

The PBS is running on a dedicated server in a datacenter.

specs:

Datastore:

GC task running for 6 days now ...

Please note: between 07-01 and 07-04, only 1 % was marked, until today, no more progress, the task still is in PHASE1, has not even started phase 2

We are running the latest Backup Server 2.2-3.

We are running 8 PBS servers in total and we see this behaviour on all servers, when the datasets get large, the performance / usability goest to zero ...

Any ideas?

fabian · Jul 6, 2022

what kind of storage are you using for the datastore? a garbage collection task does lots of metadata access to the underlying chunk store, which usually translate to random RW I/O ..

alexander@cloud · Jul 6, 2022

We are using ZFS on most of our servers.

itNGO · Jul 6, 2022

If you are not using SSDs or at least ZFS with Special-Device to boost random I/O this will never get fast.
This is how PBS works. You can do some tuning with enough RAM and this https://forum.proxmox.com/threads/p...arge-amount-of-backups-timeout-in-pve.111128/

But this is limited....

Another way might be, to use multiple Datastore, so a GC has not to go through the whole Storage....

Dunuin · Jul 6, 2022

Lets explain it for a better understanding...
Lets say you backup a 16TB VM. Most backup solution like Veeam and Co will create a single big 16TB image file (or few very big sized image files when doing differential backups) that can be read/written sequentually which will be fine for HDDs.
But in contrast to that PBS will store everything as small deduplicated chunk files of a maximum size of 4MB. Lets say your average chunk file size is 2MB, then storing a 16TB VM will result in 8 million chunk files.
And on an GC the metadata (atime) of each of those chunk files needs to be read and written so this will cause millions of small random IO.
HDDs are terrible at IOPS performance, especially when using a raidz1/2/3. Thats why a local SSD only datastore is recommended.
If you are using HDDs you should atleast use a mirror of SSDs as a ZFS special metadata device ("special" vdevs) so that all metadata will be stored on those fast SSDs and not on the slow HDDs. This will speed up GC by magnitudes as it primarily needs to read/write metadata.
Backup, Restore and Verify jobs would of cause still be very slow as these need to read/write the data part of millions of files where a special metadata device won't help that much (but still a bit, because metadata IO is offloaded to the SSDs so the HDDs don't need to handle those too and more data IO can be read/written instead).

PS: The special metadata device SSDs just need to be around 0.4 % of the size of your pool. So its not that expensive to add 2, 3 or 4 SSDs as a mirror to your pool to maybe speed up all jobs by something like factor 2-3 and GC by something like factor 100.
See here for special device sizing examples and explanation: https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954

Search

Search

Garbage Collection for big datasets takes forever to complete

alexander@cloud

Member

fabian

Proxmox Staff Member

alexander@cloud

Member

itNGO

Renowned Member

Dunuin

Distinguished Member

We value your privacy