Garbage collection speed

flai_hv · Aug 25, 2024

Hi devs and users,

Personally, I use proxmox in multiple domains (private and semi-professional).
For this reason, I have also two Backup servers running to handle the backups. All servers are doing daily backups, some even more, in total ~5TB with ~200GB changes per week.
One server is running with metadata SSD cached, data HDD ZFS filesystem as a VM. The other is "a bit weird", a cloud instance using remotely mounted storage for financial reasons.
Both of them run really well with a decent speed for backup creation and restore, even the remote storage one.

But one thing came up, the speed of garbage collection.
On the remote storage server, one run of garbage collection takes almost 2 days...
On the local storage server, one run takes 10 hours.

I know why this takes so long. PBS is going through all index files, "touches" all chunks assigned to it and later deletes all untouched chunks. This touching (even if you use features like relatime), involves the operating system and filesystem.
The required calls into the OS scale with amount of data aka chunks of data (or even better amount) and the amount of index files.
That means if you want to keep higher frequency backups, that quickly scales into billion touches per garbage collection cycles.

So, for all the users out there: What is your garbage collection duration? Anybody else out there having an "issue" like that?

But I didn't came here without an idea to change this.
I read (actually inside the code) that there were ideas to make this whole process inside memory. But that comes at a decent risk and memory footprint for large deployments.
The idea I have been trying on a replica of my backup server is deduplication of the touch requests.
There is no benefit of touching a chunk twice within one garbage collect. And the benefit is this calculation happens within PBS, no involvement of the OS. That can make this process magnitudes quicker. Especially for non-enterprise SSD deployments. Even for them, the lookup internal seems to be quicker than the touch, even on the NVME SSD I tried it with.
But that means a list of chunks needs to be kept in memory during garbage collection. In numbers 32MB per 1 million chunks. 1 million chunks means 4TB of VM data with default settings. It makes sense to truncate this data if it reaches a certain limit, what doesn't undermine the basic functionality, just slows it a bit down. But maybe that is fair if you have a single VM referencing 100TB+ on a single VM...

For the devs, what about memory footprint? Proxmox guides mention 1GB RAM per 1TB storage. So magnitudes more.
Do you think that such a solution has a chance?

BR

Florian

tcabernoch · Aug 26, 2024

flai_hv said:
What is your garbage collection duration? Anybody else out there having an "issue" like that?

Dell gen12 box, mostly ssd with a hdd vdev for capacity.
3 TB of data, 57 Groups, 281 Snapshots

2024-08-25T18:00:42-04:00: starting garbage collection on store
2024-08-25T18:00:42-04:00: Start GC phase1 (mark used chunks)
2024-08-25T18:00:48-04:00: marked 1% (6 of 555 index files)
<snip>
2024-08-25T18:05:15-04:00: processed 99% (2197834 chunks)
2024-08-25T18:05:17-04:00: Removed garbage: 71.507 GiB
2024-08-25T18:05:17-04:00: Removed chunks: 73074
2024-08-25T18:05:17-04:00: Pending removals: 377.241 GiB (in 355233 chunks)
2024-08-25T18:05:17-04:00: Original data usage: 64.99 TiB
2024-08-25T18:05:17-04:00: On-Disk usage: 2.351 TiB (3.62%)
2024-08-25T18:05:17-04:00: On-Disk chunks: 1791641
2024-08-25T18:05:17-04:00: Deduplication factor: 27.65
2024-08-25T18:05:17-04:00: Average chunk size: 1.376 MiB
2024-08-25T18:05:17-04:00: TASK OK

Five minutes.

tcabernoch · Aug 26, 2024

That first one was the primary, baremetal, made to perform.

This might be more fair.
Here's my secondary, runs as a TrueNAS guest virtual machine, so its right on top of the storage.
3 TB, 57 Groups, 299 Snapshots

2024-08-26T02:30:00-04:00: starting garbage collection on store
2024-08-26T02:30:00-04:00: task triggered by schedule '2,22:30'
2024-08-26T02:30:00-04:00: Start GC phase1 (mark used chunks)
2024-08-26T02:30:11-04:00: marked 1% (6 of 555 index files)
<snip>
2024-08-26T03:12:53-04:00: processed 99% (2154742 chunks)
2024-08-26T03:13:12-04:00: Removed garbage: 0 B
2024-08-26T03:13:12-04:00: Removed chunks: 0
2024-08-26T03:13:12-04:00: Pending removals: 384.927 GiB (in 355062 chunks)
2024-08-26T03:13:12-04:00: Original data usage: 64.99 TiB
2024-08-26T03:13:12-04:00: On-Disk usage: 2.393 TiB (3.68%)
2024-08-26T03:13:12-04:00: On-Disk chunks: 1821382
2024-08-26T03:13:12-04:00: Deduplication factor: 27.16
2024-08-26T03:13:12-04:00: Average chunk size: 1.377 MiB
2024-08-26T03:13:12-04:00: TASK OK

43 minutes.

BTW this is a non-typical day. I'm moving data around.
The secondary was just deployed, purging data from primary, shuffling it to secondary.

UdoB · Aug 26, 2024

Just another random datapoint: this is a HP MicroServer Gen10, turned on once a week. With rotating rust in a single vdev = 4 drives in Raidz2 = worst case possible. No Special Device involved. Co-installed on PVE in an LXC, using a standard mountpoint for storage.

3.2 TB; 167 Groups, 2842 Snapshots

Code:

2024-08-25T07:06:00+02:00: starting garbage collection on store pbsc
2024-08-25T07:06:00+02:00: task triggered by schedule 'daily'
2024-08-25T07:06:00+02:00: Start GC phase1 (mark used chunks)
2024-08-25T07:07:43+02:00: marked 1% (41 of 4005 index files)
...
2024-08-25T07:36:48+02:00: marked 100% (4005 of 4005 index files)
2024-08-25T07:36:48+02:00: Start GC phase2 (sweep unused chunks)
2024-08-25T07:36:48+02:00: processed 1% (23338 chunks)
...
2024-08-25T07:36:59+02:00: processed 99% (2293891 chunks)
2024-08-25T07:36:59+02:00: Removed garbage: 0 B
2024-08-25T07:36:59+02:00: Removed chunks: 0
2024-08-25T07:36:59+02:00: Original data usage: 74.813 TiB
2024-08-25T07:36:59+02:00: On-Disk usage: 2.738 TiB (3.66%)
2024-08-25T07:36:59+02:00: On-Disk chunks: 2317035
2024-08-25T07:36:59+02:00: Deduplication factor: 27.33
2024-08-25T07:36:59+02:00: Average chunk size: 1.239 MiB
2024-08-25T07:36:59+02:00: TASK OK

30 minutes; I am surprised about zero chunks removed... obviously there was no backup in between...

The week before:

Code:

2024-08-18T09:53:00+02:00: starting garbage collection on store pbsc
2024-08-18T09:53:00+02:00: task triggered by schedule 'daily'
2024-08-18T09:53:00+02:00: Start GC phase1 (mark used chunks)
2024-08-18T09:53:47+02:00: marked 1% (42 of 4150 index files)
...
2024-08-18T10:21:36+02:00: marked 100% (4150 of 4150 index files)
2024-08-18T10:21:36+02:00: Start GC phase2 (sweep unused chunks)
2024-08-18T10:21:49+02:00: processed 1% (23701 chunks)
...
2024-08-18T11:05:09+02:00: processed 99% (2409651 chunks)
2024-08-18T11:05:36+02:00: Removed garbage: 109.197 GiB
2024-08-18T11:05:36+02:00: Removed chunks: 116999
2024-08-18T11:05:36+02:00: Original data usage: 74.813 TiB
2024-08-18T11:05:36+02:00: On-Disk usage: 2.738 TiB (3.66%)
2024-08-18T11:05:36+02:00: On-Disk chunks: 2317035
2024-08-18T11:05:36+02:00: Deduplication factor: 27.33
2024-08-18T11:05:36+02:00: Average chunk size: 1.239 MiB
2024-08-18T11:05:36+02:00: TASK OK

Okay, the same ~30 minutes for phase1 plus 44 minutes for the actual removal.

Search

Search

Garbage collection speed

flai_hv

New Member

tcabernoch

Active Member

tcabernoch

Active Member

UdoB

Distinguished Member