Slow garbage collection on PBS

budy

Member
Jan 31, 2020
205
12
23
56
Hi,

I am running a PBS on one PVE/Ceph node where all OSDs are 3.4 TiB WD REDs. This backup pool has become rather full and and I wonder if this is the reason, that GC runs for days. There is almost no CPU or storage I/O load on the system, but quite a number of snapshots from my PVE cluster:

CT: 1 Groups, 25 Snapshots
Host :1 Groups, 0 Snapshots
VM: 141 Groups, 3476 Snapshots
Storage pool Usage: 94.54% (17.78 TiB of 18.81 TiB)
Deduplication Factor: 31.66

The GC job has been running for 18+ hours and is still only at:


Code:
2021-06-10T14:59:39+02:00: starting garbage collection on store proxmoxBackup
2021-06-10T14:59:39+02:00: Start GC phase1 (mark used chunks)
2021-06-10T15:09:39+02:00: marked 1% (54 of 5377 index files)
2021-06-10T15:15:29+02:00: marked 2% (108 of 5377 index files)
2021-06-11T03:03:19+02:00: marked 3% (162 of 5377 index files)

There's nothing else going on on this pool, its only dedicated to PBS. The backup volume is, as stated above on a CEPH pool, which consists of 3 x 6 x 3.4TiB WD REDs, so I expected no latency issues from that direction. I am currently a bit at a loss, as of what I could do to get the performance up and any ideas are greatly appreciated.

Thanks,
budy
 

budy

Member
Jan 31, 2020
205
12
23
56
Okay, so… GC needs to read all chunks and it looks like, that this is what it is doing. I checked a while back in the logs and found some other occurrences, where GC took 4 to 5 days to complete. I also took a look at iostat and it seems that GC is doing this strictly sequentially. Maybe, if there was some option to have that part being parallized, but atm its reading from one OSD on any Ceph node in the pool at approx. 200 MB/s and I am sure, that this could be improved.
 

adoII

Active Member
Jan 28, 2010
166
15
38
I had the same Problem. Solution was I added a mirror of small SSDs to the ZFS Datastore as "Special Device" , copied all backup data and deleted the original data. Afterwards the garbage collection was running in about 30 Minutes. Before I used the special Device it took about 20 hours.
 
Last edited:

budy

Member
Jan 31, 2020
205
12
23
56
Thanks for chiming in, but in my case, I am running the PBS backup store on a SSD-only Ceph storage, so read IOPs shouldn't be an issue. Before this Ceph storage became my actual PBS data store, it served as the working Ceph for my main PVE cluster and the performance was really great.
 
May 4, 2021
13
0
1
43
Thanks for chiming in, but in my case, I am running the PBS backup store on a SSD-only Ceph storage, so read IOPs shouldn't be an issue. Before this Ceph storage became my actual PBS data store, it served as the working Ceph for my main PVE cluster and the performance was really great.
Hi - just curious how you created a PBS datastore on CEPH?
Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!