[SOLVED] Garbage Collect on large NFS datastore get stuck

guillaumev

New Member
Oct 13, 2023
8
1
3
Hi,
I have a setup which has been working for a couple of months : a secondary PBS server which sync my main PBS server every day. I'm running it with what I had on hand, so it's a typical HP server, and my data is stored on a NFS datastore on a synology NAS. My datastore is around 15TB of data. Garbage collect are notoriously slow, but i'm OK with that (it's a secondary after all) - 4 to 5 days. My PBS is now running PBS 3.3 (previously, it was running PBS 2.4, but the update didn't solve the issue)
My last garbage collect from early january started fine, but it then got stuck at 7%. I've restarted it multiple times, and it's always stuck at the same percentage. However, when I look with lsof, it's not always stuck on the same backup.
For example, this is one of the output of lsof :
Code:
# lsof /nas1-backups/
COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
proxmox-b 1058 backup  mem    REG   0,47   298576 551092 /nas1-backups/ns/clustan/host/pve1/2022-08-28T03:00:01Z/root.pxar.didx (10.1.9.11:/volume1/pbs-backups)
proxmox-b 1058 backup   18r   REG   0,47   298576 551092 /nas1-backups/ns/clustan/host/pve1/2022-08-28T03:00:01Z/root.pxar.didx (10.1.9.11:/volume1/pbs-backups)
proxmox-b 1058 backup   19uW  REG   0,47        0    260 /nas1-backups/.lock (10.1.9.11:/volume1/pbs-backups)
This file doesn't seem to have an issue (i can copy it, display it ...), and everytime it get stuck, it's stuck on a different file - always on the clustan namespace (but I think this is due to how my namespace are named, and how PBS traverses its backups), but not always the same machine, and when it's the same machine, it's different backups (at different times).

Here's how my NFS is mounted :
Code:
# mount -l|grep nfs
10.1.9.11:/volume1/pbs-backups on /nas1-backups type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.9.202,local_lock=none,addr=10.1.9.11)
And the exports from my synology :
Code:
# cat /etc/exports

/volume1/pbs-backups    10.1.9.202/32(rw,async,no_wdelay,all_squash,insecure_locks,sec=sys,anonuid=34,anongid=34)
(the id 34 is PBS's backup user ID which is 34 on my PBS)

Any idea on how to solve this ?

cheers,
 
Hi,
how did you determine that the garbage collection is stuck?

Note that the progress bar might not update linearly in time, as it is updated based on the number of already scanned index files with respect to the number of total index files. If these index files reference a lot of chunks, then progress updates will be slower as compared to index files referencing less chunks.
 
Both sides could/should be tuned.
On the nfs server side you have a very "old" nfs-server-kernel (which is still the actual one from Synology) which just supports nfs4.1 and nfs max packet size of 131072bytes. This could just be changed be using any actual linux distro and maybe other hw-host but you then loose that webui if you want that. But in general any pc/server with your (synology)disks and eg. a Debian (maybe +upgrade to pve) or even a pve installation (just using that "like a Q-device but being a real node for 1 higher Quorum" without running vm's there) would do better for you with nfs4.2 and (server+client auto-managed to max.) packet size 1048576bytes.
On the nfs client side the nfs read ahead was unfortunately hardly trottled since few years to 128kb for better support high client counts in server.
This could be reset on client *AFTER* the nfs mount is etablished: "echo 8192 > /sys/class/bdi/$(mountpoint -d <my_mount_path>)/read_ahead_kb"
after that you will get much faster nfs throughput.
 
Hi,
how did you determine that the garbage collection is stuck?

Note that the progress bar might not update linearly in time, as it is updated based on the number of already scanned index files with respect to the number of total index files. If these index files reference a lot of chunks, then progress updates will be slower as compared to index files referencing less chunks.
Thanks for the heads up, I thought it was stuck because the task log was the following :

2025-01-17T17:22:23+01:00: starting garbage collection on store nas1-backups
2025-01-17T17:22:23+01:00: Start GC phase1 (mark used chunks)
2025-01-17T17:23:25+01:00: marked 1% (150 of 14938 index files)
2025-01-17T17:27:36+01:00: marked 2% (299 of 14938 index files)
2025-01-17T17:28:46+01:00: marked 3% (449 of 14938 index files)
2025-01-17T17:30:34+01:00: marked 4% (598 of 14938 index files)
2025-01-17T17:33:48+01:00: marked 5% (747 of 14938 index files)
2025-01-17T17:36:47+01:00: marked 6% (897 of 14938 index files)
2025-01-17T17:40:21+01:00: marked 7% (1046 of 14938 index files)

And then nothing was added for several hours.
But looking at my old (successful) garbage collect, I saw that this was already happening, with a huge time difference at some point (1% taking multiple days). Which makes sense, because I have a huge VM (~2tb of data), so a lot of referenced chunk, and over NFS it can takes quite a long time to process everything.
I've let this garbage collect and it's not almost done, thanks for your reply !
 
Both sides could/should be tuned.
On the nfs server side you have a very "old" nfs-server-kernel (which is still the actual one from Synology) which just supports nfs4.1 and nfs max packet size of 131072bytes. This could just be changed be using any actual linux distro and maybe other hw-host but you then loose that webui if you want that. But in general any pc/server with your (synology)disks and eg. a Debian (maybe +upgrade to pve) or even a pve installation (just using that "like a Q-device but being a real node for 1 higher Quorum" without running vm's there) would do better for you with nfs4.2 and (server+client auto-managed to max.) packet size 1048576bytes.
On the nfs client side the nfs read ahead was unfortunately hardly trottled since few years to 128kb for better support high client counts in server.
This could be reset on client *AFTER* the nfs mount is etablished: "echo 8192 > /sys/class/bdi/$(mountpoint -d <my_mount_path>)/read_ahead_kb"
after that you will get much faster nfs throughput.

I would like to keep my Synology, mostly because I don't have another system which is rackable and can support 8 disks, so installing something else on it or using the disk in another system is out of the question. Do you think a iSCSI would be better in terms of performance, or would it be similar to NFSV4.1 ?
 
If you don't have a block storage for iscsi use I would never (again) consider the use of iscsi as of it's sensibility and setup. Nfs is idiot easy to use in general and even extrem easy for storage exchange and vm image move arounds etc. In throughput nfs (with hw-raid+xfs) reach near disks specs, for iops filesystem cache is on nfs server and on client is nfs cache also. And nfs storage can be used for regular data files also, just with different exports without stranded capacity. Any HW and nearly even any OS for a nfs server is useable, switching any of them anytime easy too. Nfs is a no brainer anytime for everyone.