[SOLVED] Garbage Collection taking weeks to complete

Adamg

Member
Feb 8, 2020
14
4
23
43
I have a proxmox backup server off site that does a remote sync of an important datastore. I realized that the "remove vanished" option is not deleting stuff to mirror the remote important datastore so I scheduled the garbage collection but it ran through about 30% and has slowed to a crawl now and appears that it will take weeks to complete.

If it is going to take weeks to complete that means that the garbage collection is going to be running 24/7 just to delete stuff and get mirrored of the important datastore. Any way to speed this up?
 
Thanks Dunuin for the reply. Yes, I am using 3 x 4 TB (WD RED) HDDs.

There is no way I would invest another $2000 into the offsite backup server since it is just an offsite backup of the onsite backup server. I wish the remote sync and remove vanished option would work without having to run a garbage collection on it again. The garbage collection is already done on the main backup server.... oh well, it is what it is...

I am interested in this ZFS special device thing. I've never heard of it and from a quick look it appears that I can add a special device to an existing ZFS pool.

Do you or anyone know if I could just add 2 x 120 GB SSDs as special devices to the ZFS pool with the 3 x 4 TB HDDs? and if yes, would that actually speed anything up?
 
Yes, here adding a special device SSD speeded up the GC up from a few hours to a few minutes. And if I remember right rule of thumb would be 0.4% capacity of special device vs normal vdevs. Here is a thread explaining how to find out how big you special device has to be: https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954

So two 120GB SSDs in a mirror should be fine. But you really want a mirror because a special device is no cache and loosing your special device means also loosing all your data on the HDDs.

Also check that your WD reds use CMR and not SMR. The newer "WD Red" without "Plus" or "Pro" use SMR and shouldn't be used with ZFS. SMR disks got fast writes first (when writing to CMR cache area) and become unusable slow as seen as the cache gets full.
 
Last edited:
Fantastic help! thanks Dunuin!

I added 2 SSDs and it was really easy to add them to my existing pool, I ran the command:

zpool add sync-datastore special mirror ata-KINGSTON_SUV400S37120G_82B9J921JH86GG ata-KINGSTON_SUV400S37120G_HK78BH3GKDKFHK

and it was done. And then running:

zpool list -v

showed the special device was part of the pool. very cool.

Would you recommend leaving the default value of the property "special_small_blocks" as 0? or should I be changing that to 4K by running the command:

zfs set special_small_blocks=4K sync-datastore

I don't really understand it so I'll just trust your recommendation.

Thanks again for the help!
 
I would set special_small_blocks to 0. Otherwise the special device might also store data in addition to only metadta and then you can run out of space easily.
Also keep in mind that no existing metadata will be moved from the HDD vdevs to the special device SSDs. So it will only get faster over time if you don't rewrite your data. For that you could either delete all backups and resync them from your primary PBS or if you don't want to wait and your pool got more than half capacity free you could move your datastore to another dataset and then move it back again so all metadata will be rewritten.
 
Once again, thanks so much for the reply!

Unfortunately the datastore is 95% full. If I connect an external HDD and create a new datastore in the GUI, is it then possible to move the backup data to the new datastore and then back to get the metadata to use the special SSDs?

I see the datastore folders in /mnt/datastore/ so would I just run:

rsync -av --remove-source-files /mnt/datastore/sync-datastore/ /mnt/datastore/new-temp-datastore/

And then reverse the rsync after it finishes?
 
Once again, thanks so much for the reply!

Unfortunately the datastore is 95% full. If I connect an external HDD and create a new datastore in the GUI, is it then possible to move the backup data to the new datastore and then back to get the metadata to use the special SSDs?
A ZFS pool uses Copy-on-Write and therefore should always have 20% of free space or otherwise it will become slow and will fragment faster (and there is no way to defragment it except for moving everything off that pool and copy it back again).
I see the datastore folders in /mnt/datastore/ so would I just run:

rsync -av --remove-source-files /mnt/datastore/sync-datastore/ /mnt/datastore/new-temp-datastore/

And then reverse the rsync after it finishes?
Jup, should work. Just make sure to disable the sync and maintaince jobs so PBS won't write to that datastore while you move it.
 
Last edited:
Thanks again for your help with this Dunuin. This made a huge difference. I did what we discussed above and the garbage collection speed has significantly improved. What was going to take weeks is now able to finish in a day!
 
  • Like
Reactions: Dunuin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!