Garbage Collection for big datasets takes forever to complete

Nov 24, 2020
20
12
3
Austria
Hello!

We have quite large datasets and VM Backups (~16 TB for the biggest VM for example) and do hourly snapshots, the deltas are not that large, but the complete VM has a few 16 TB).

The Problem is now, that garbage collections are taking forever ....

In general the performance of the whole backup server is very slow ...

The PBS is running on a dedicated server in a datacenter.

specs:

1657094762623.png

Datastore:
1657094795486.png

GC task running for 6 days now ...


1657094828256.png

Please note: between 07-01 and 07-04, only 1 % was marked, until today, no more progress, the task still is in PHASE1, has not even started phase 2

We are running the latest Backup Server 2.2-3.

We are running 8 PBS servers in total and we see this behaviour on all servers, when the datasets get large, the performance / usability goest to zero ...

Any ideas?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,619
1,432
164
what kind of storage are you using for the datastore? a garbage collection task does lots of metadata access to the underlying chunk store, which usually translate to random RW I/O ..
 

itNGO

Well-Known Member
Jun 12, 2020
524
114
48
44
Germany
it-ngo.com
Last edited:
  • Like
Reactions: alexander@cloud

Dunuin

Famous Member
Jun 30, 2020
6,756
1,571
149
Germany
Lets explain it for a better understanding...
Lets say you backup a 16TB VM. Most backup solution like Veeam and Co will create a single big 16TB image file (or few very big sized image files when doing differential backups) that can be read/written sequentually which will be fine for HDDs.
But in contrast to that PBS will store everything as small deduplicated chunk files of a maximum size of 4MB. Lets say your average chunk file size is 2MB, then storing a 16TB VM will result in 8 million chunk files.
And on an GC the metadata (atime) of each of those chunk files needs to be read and written so this will cause millions of small random IO.
HDDs are terrible at IOPS performance, especially when using a raidz1/2/3. Thats why a local SSD only datastore is recommended.
If you are using HDDs you should atleast use a mirror of SSDs as a ZFS special metadata device ("special" vdevs) so that all metadata will be stored on those fast SSDs and not on the slow HDDs. This will speed up GC by magnitudes as it primarily needs to read/write metadata.
Backup, Restore and Verify jobs would of cause still be very slow as these need to read/write the data part of millions of files where a special metadata device won't help that much (but still a bit, because metadata IO is offloaded to the SSDs so the HDDs don't need to handle those too and more data IO can be read/written instead).

PS: The special metadata device SSDs just need to be around 0.4 % of the size of your pool. So its not that expensive to add 2, 3 or 4 SSDs as a mirror to your pool to maybe speed up all jobs by something like factor 2-3 and GC by something like factor 100.
See here for special device sizing examples and explanation: https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!