[SOLVED] Pbs Garbage collection taking days

Fra

Renowned Member
Dec 10, 2011
143
10
83
2022-03-24T06:45:56+01:00: starting garbage collection on store pbs-nas 2022-03-24T06:45:56+01:00: Start GC phase1 (mark used chunks) 2022-03-24T06:53:39+01:00: marked 1% (25 of 2472 index files) 2022-03-24T06:53:58+01:00: marked 2% (50 of 2472 index files) 2022-03-24T06:54:02+01:00: marked 3% (75 of 2472 index files) 2022-03-24T07:12:35+01:00: marked 4% (99 of 2472 index files) 2022-03-24T07:51:16+01:00: marked 5% (124 of 2472 index files) 2022-03-24T08:30:27+01:00: marked 6% (149 of 2472 index files) 2022-03-24T09:04:30+01:00: marked 7% (174 of 2472 index files) 2022-03-24T09:26:52+01:00: marked 8% (198 of 2472 index files) 2022-03-24T09:36:00+01:00: marked 9% (223 of 2472 index files) 2022-03-24T09:38:12+01:00: marked 10% (248 of 2472 index files) 2022-03-24T09:51:09+01:00: marked 11% (272 of 2472 index files) 2022-03-24T09:57:50+01:00: marked 12% (297 of 2472 index files) 2022-03-24T10:05:48+01:00: marked 13% (322 of 2472 index files) 2022-03-24T10:42:33+01:00: marked 14% (347 of 2472 index files) 2022-03-24T11:00:06+01:00: marked 15% (371 of 2472 index files) 2022-03-24T11:00:21+01:00: marked 16% (396 of 2472 index files) 2022-03-24T11:23:34+01:00: marked 17% (421 of 2472 index files) 2022-03-24T11:32:31+01:00: marked 18% (445 of 2472 index files) 2022-03-24T11:35:19+01:00: marked 19% (470 of 2472 index files) 2022-03-24T11:36:19+01:00: marked 20% (495 of 2472 index files) 2022-03-24T11:36:23+01:00: marked 21% (520 of 2472 index files) 2022-03-24T11:36:24+01:00: marked 22% (544 of 2472 index files) 2022-03-24T11:36:25+01:00: marked 23% (569 of 2472 index files) 2022-03-24T11:36:25+01:00: marked 24% (594 of 2472 index files) 2022-03-24T11:36:25+01:00: marked 25% (618 of 2472 index files) 2022-03-24T11:36:26+01:00: marked 26% (643 of 2472 index files) 2022-03-24T11:36:26+01:00: marked 27% (668 of 2472 index files) 2022-03-24T11:36:34+01:00: marked 28% (693 of 2472 index files) .... 2022-03-24T17:05:00+01:00: marked 76% (1879 of 2472 index files) 2022-03-24T18:30:19+01:00: marked 77% (1904 of 2472 index files) 2022-03-24T22:15:11+01:00: marked 78% (1929 of 2472 index files) 2022-03-25T02:48:54+01:00: marked 79% (1953 of 2472 index files) 2022-03-25T08:07:50+01:00: marked 80% (1978 of 2472 index files) 2022-03-25T11:53:03+01:00: marked 81% (2003 of 2472 index files) 2022-03-25T14:56:20+01:00: marked 82% (2028 of 2472 index files) 2022-03-25T22:35:22+01:00: marked 83% (2052 of 2472 index files) 2022-03-26T13:14:19+01:00: marked 84% (2077 of 2472 index files) 2022-03-26T14:13:36+01:00: marked 85% (2102 of 2472 index files) 2022-03-26T15:12:04+01:00: marked 86% (2126 of 2472 index files) 2022-03-26T18:12:42+01:00: marked 87% (2151 of 2472 index files)

The GC is taking days.

The setup is sure not ordinary:
* pbs is a VM in proxmox (all super updated)
* storage is NFS on a qnap NAS (90% full)

This pbs sync to the remote-pbs where everything work super fine (so it's a backup of backups): the sync is fine (it takes almost the whole night due to slow internet, but it is fine), the (manual) prune is also working fine; no hardware issue with the storage.

I think the mistake we made was to set the GC to run not that often: it started to go slower and slower, and not it really cannot end (we paused the sync to be able to finish the GC).

We probably have to change storage, but we need a month for the shift.


Do you have any suggestion on how to speed up the actual GC? will a prune of old backups help?

thanks
 
A PBS GC needs to read the atime of each chunk file. And everthing is stored as chunks with a maximum size of 4MB each. So if your got 4TB of backups that means the PBS GC needs to read atleast 1.000.000 files. Reading chunks needs alot of IOPS and there can be alot of random accesses. Thats something that HDDs can'T handle well and the NFS share adds additional latency.
Got a similar setup (PBS VM with datastore on NFS on HDDs) and there it wasn't unusual that a GC of 1TB of backups needed 2-3 hours.
How much backups do you got? If you got something like 20 TB of backups that would be normal. Otherwise thats indeed a bit slow.

What really helped here was adding SSDs to store the metadata. With that the GC is multiple magnitudes faster. But I guess thats not an option with your qnap. You could have a look if your qnap somehow supports SSD caching of metadata so that the millions of atimes can be read from SSD instead of HDDs.
 
Last edited:
  • Like
Reactions: Fra
thank you!

> How much backups do you got?
Datastore is about 5.5TiB :(


I see we have the option to add a SSD in the qnap, we may try it.
 
I've changed the type on NIC where the NFS is mounted, from VirtIO to Intel E1000 and restarted the GC

it is much faster so far: can the change be the reason for the better performance?

Code:
2022-03-27T20:13:31+02:00: starting garbage collection on store pbs-nas
2022-03-27T20:13:31+02:00: Start GC phase1 (mark used chunks)
2022-03-27T20:14:37+02:00: marked 1% (25 of 2486 index files)
2022-03-27T20:14:42+02:00: marked 2% (50 of 2486 index files)
2022-03-27T20:16:41+02:00: marked 3% (75 of 2486 index files)
2022-03-27T20:19:41+02:00: marked 4% (100 of 2486 index files)
2022-03-27T20:27:19+02:00: marked 5% (125 of 2486 index files)


btw, this is the nfs mount inside the VM (proxmox backup server):

Code:
10.10.10.13:/EXT001 /backup nfs rw,async,soft,intr,local_lock=all 0 0

and this is the benchmark (while the GC is running):

Code:
# proxmox-backup-client benchmark
SHA256 speed: 170.09 MB/s
Compression speed: 253.99 MB/s
Decompress speed: 352.50 MB/s
AES256/GCM speed: 71.63 MB/s
Verify speed: 116.99 MB/s
┌───────────────────────────────────┬───────────────────┐
│ Name                              │ Value             │
╞═══════════════════════════════════╪═══════════════════╡
│ TLS (maximal backup upload speed) │ not tested        │
├───────────────────────────────────┼───────────────────┤
│ SHA256 checksum computation speed │ 170.09 MB/s (8%)  │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 compression speed    │ 253.99 MB/s (34%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 decompression speed  │ 352.50 MB/s (29%) │
├───────────────────────────────────┼───────────────────┤
│ Chunk verification speed          │ 116.99 MB/s (15%) │
├───────────────────────────────────┼───────────────────┤
│ AES256 GCM encryption speed       │ 71.63 MB/s (2%)   │
└───────────────────────────────────┴───────────────────┘
 
Usually virtio should be way faster than E1000. Because virtio is paravirtualized and E1000 fully emulated.

You could use iperf3 to check the network performance and fio to check the storage performance to find the bottleneck.
 
Last edited:
  • Like
Reactions: Fra
Usually virtio should be way faster than E1000. Because virtio is paravirtualized and E1000 fully emulated.

you are right: it did not give any benefit to do that change.

You could use iperf3 to check the network performance and fio to check the storage performance to find the bottleneck.

We found the bottleneck: the NFS shared folder in the QNAP was stored in an *external USB connected storage (QNAP TR-004)*: we moved the data in the main storage and the GC is now muuuuuch faster.
 
(...cut...)

What really helped here was adding SSDs to store the metadata. With that the GC is multiple magnitudes faster. (...cut...)

Do you mean ZFS special device?

Does it help with existing data? - My lasts GC took more then 3.5 days and I wonder what I can do to speed it up.
I have two NVMe disks that I use as L2ARC, data is stored on 6 disks in RAIDZ2?

Do you think ditching L2ARC and dedicating those drives to the special device will reduce GC time?
 
Do you mean ZFS special device?
Jup.
Does it help with existing data?
No. Only new metadata will be stored on the SSDs. But in case your pool got enough space you could create a temporary dataset (each dataset is its own filesystem, so this is important), stop all PBS services, copy all your datastores file to the new dataset, add your special device SSD, copy everything back and start the PBS services again. That way the metadata of all old backups should end up on the SSDs.
- My lasts GC took more then 3.5 days and I wonder what I can do to speed it up.
I have two NVMe disks that I use as L2ARC, data is stored on 6 disks in RAIDZ2?

Do you think ditching L2ARC and dedicating those drives to the special device will reduce GC time?
That depends. Do you run your L2ARC with secondarycache=metadata so it only caches metadata and not also the chunks?
In case you never reboot your PBS and let the L2ARC store the metadata this should result in a similar metadata read performance.
Benefit of special devices would be that they also speed up backups, restores, verifies and metadata writes.
So special devices might be faster, because a GC is first writing the atime (so metadata) of every of the millions of chunk files and then reading the atime of all chunk files, deleting the chunks that are older than 1 day. A L2ARC can only cache reads, not writes.

Also keep in mind that a raidz2 isn't a great choice. PBS needs IOPS performance where HDDs are horrible at. Thats why it's recommanden to use SSDs only, as there the IOPS performance is better by manitudes. If you still want to use HDDs it would be better to use a striped mirror as only with that the IOPS performance will increase with the number of disks. No matter how much HDDs you got in your raidz2, the IOPS performance will still be the same as a single disk, so probably something around 100 IO per second. The GC needs to do millions over millions of more or less random IO, so this should make clear where the bottleneck is. If you got 1TB of backups this will result in 1 million IO. With just 100 IOPS this needs 10000 seconds. A SSD that got tenthousands of IOPS could do that in seconds or a few minutes.
 
Last edited:
  • Like
Reactions: warf
>In case you never reboot your PBS and let the L2ARC store the metadata this should result in a similar metadata read performance.

in theory , yes. in reality - not really. zfs really sucks at caching metadata.
i'm running it that way, and it accelerates noticeably to some degree, but using special vdev is way faster. zfs is losing metadata/dnode information when it shouldn't. and it also isn't pushing everthing to l2arc, even after running for weeks. go, test yourself and watch your iops/arcstats...

see https://github.com/openzfs/zfs/issues/10508 for example
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!