Garbage Collect on large NFS datastore get stuck

guillaumev

New Member
Oct 13, 2023
6
1
3
Hi,
I have a setup which has been working for a couple of months : a secondary PBS server which sync my main PBS server every day. I'm running it with what I had on hand, so it's a typical HP server, and my data is stored on a NFS datastore on a synology NAS. My datastore is around 15TB of data. Garbage collect are notoriously slow, but i'm OK with that (it's a secondary after all) - 4 to 5 days. My PBS is now running PBS 3.3 (previously, it was running PBS 2.4, but the update didn't solve the issue)
My last garbage collect from early january started fine, but it then got stuck at 7%. I've restarted it multiple times, and it's always stuck at the same percentage. However, when I look with lsof, it's not always stuck on the same backup.
For example, this is one of the output of lsof :
Code:
# lsof /nas1-backups/
COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
proxmox-b 1058 backup  mem    REG   0,47   298576 551092 /nas1-backups/ns/clustan/host/pve1/2022-08-28T03:00:01Z/root.pxar.didx (10.1.9.11:/volume1/pbs-backups)
proxmox-b 1058 backup   18r   REG   0,47   298576 551092 /nas1-backups/ns/clustan/host/pve1/2022-08-28T03:00:01Z/root.pxar.didx (10.1.9.11:/volume1/pbs-backups)
proxmox-b 1058 backup   19uW  REG   0,47        0    260 /nas1-backups/.lock (10.1.9.11:/volume1/pbs-backups)
This file doesn't seem to have an issue (i can copy it, display it ...), and everytime it get stuck, it's stuck on a different file - always on the clustan namespace (but I think this is due to how my namespace are named, and how PBS traverses its backups), but not always the same machine, and when it's the same machine, it's different backups (at different times).

Here's how my NFS is mounted :
Code:
# mount -l|grep nfs
10.1.9.11:/volume1/pbs-backups on /nas1-backups type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.9.202,local_lock=none,addr=10.1.9.11)
And the exports from my synology :
Code:
# cat /etc/exports

/volume1/pbs-backups    10.1.9.202/32(rw,async,no_wdelay,all_squash,insecure_locks,sec=sys,anonuid=34,anongid=34)
(the id 34 is PBS's backup user ID which is 34 on my PBS)

Any idea on how to solve this ?

cheers,
 
Hi,
how did you determine that the garbage collection is stuck?

Note that the progress bar might not update linearly in time, as it is updated based on the number of already scanned index files with respect to the number of total index files. If these index files reference a lot of chunks, then progress updates will be slower as compared to index files referencing less chunks.
 
Both sides could/should be tuned.
On the nfs server side you have a very "old" nfs-server-kernel (which is still the actual one from Synology) which just supports nfs4.1 and nfs max packet size of 131072bytes. This could just be changed be using any actual linux distro and maybe other hw-host but you then loose that webui if you want that. But in general any pc/server with your (synology)disks and eg. a Debian (maybe +upgrade to pve) or even a pve installation (just using that "like a Q-device but being a real node for 1 higher Quorum" without running vm's there) would do better for you with nfs4.2 and (server+client auto-managed to max.) packet size 1048576bytes.
On the nfs client side the nfs read ahead was unfortunately hardly trottled since few years to 128kb for better support high client counts in server.
This could be reset on client *AFTER* the nfs mount is etablished: "echo 8192 > /sys/class/bdi/$(mountpoint -d <my_mount_path>)/read_ahead_kb"
after that you will get much faster nfs throughput.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!