Garbage collection, mark used chunks took very long time at a certain percentage

vmware

Member
Jul 8, 2020
35
8
13
Sydney, Australia
On my PBS 2.4 server, the GC process took very long time (almost 2 days) to mark used chunks through 14 - 23%.

Code:
2023-04-09T14:25:38: starting garbage collection on store store1
2023-04-09T14:25:38: Start GC phase1 (mark used chunks)
2023-04-09T14:30:45: marked 1% (30 of 2983 index files)
2023-04-09T14:33:12: marked 2% (60 of 2983 index files)
2023-04-09T14:35:05: marked 3% (90 of 2983 index files)
2023-04-09T14:36:33: marked 4% (120 of 2983 index files)
2023-04-09T14:37:51: marked 5% (150 of 2983 index files)
2023-04-09T14:41:01: marked 6% (179 of 2983 index files)
2023-04-09T14:43:53: marked 7% (209 of 2983 index files)
2023-04-09T14:48:31: marked 8% (239 of 2983 index files)
2023-04-09T14:50:37: marked 9% (269 of 2983 index files)
2023-04-09T14:54:03: marked 10% (299 of 2983 index files)
2023-04-09T14:55:44: marked 11% (329 of 2983 index files)
2023-04-09T15:00:08: marked 12% (358 of 2983 index files)
2023-04-09T15:06:29: marked 13% (388 of 2983 index files)
2023-04-09T15:06:44: marked 14% (418 of 2983 index files)
2023-04-09T16:04:58: marked 15% (448 of 2983 index files)
2023-04-09T21:22:14: marked 16% (478 of 2983 index files)
2023-04-10T02:35:50: marked 17% (508 of 2983 index files)
2023-04-10T10:42:11: marked 18% (537 of 2983 index files)
2023-04-10T16:03:02: marked 19% (567 of 2983 index files)
2023-04-10T20:57:47: marked 20% (597 of 2983 index files)
2023-04-11T02:54:07: marked 21% (627 of 2983 index files)
2023-04-11T08:23:09: marked 22% (657 of 2983 index files)
2023-04-11T09:02:43: marked 23% (687 of 2983 index files)
2023-04-11T09:03:14: marked 24% (716 of 2983 index files)
2023-04-11T09:03:45: marked 25% (746 of 2983 index files)
...
...
2023-04-11T09:40:38: TASK OK

On disk data is about 700GB only.

The same datastore is sync'ed to another PBS server, which doesn't have this issue at all. (Took 11 minutes to GC)
 
what is the underlying storage like? hdds or ssds?
was there anything in the log during the gc ?
 
Underlying storage is a 3TB USB HDD, ext4. I've beeing using this datastore for 1-2 years now and the slow down happened recently.

Which logs should I check while GC is running? /var/log/messages?
 
better check /var/log/syslog or the journal (with journalctl)

Underlying storage is a 3TB USB HDD, ext4. I've beeing using this datastore for 1-2 years now and the slow down happened recently.
it can happen that hdds break and e.g. some blocks can't be accessed or take forever to access
check also the smart values if the usb controller allows this (should then even be possible via the webui)
 
  • Like
Reactions: vmware
Would running GC weekly (instead of the default daily) significantly increase the time taken for each run?

IIRC, I changed the GC to run weekly after first noticing GC took over 24 hours to complete.
 
Would running GC weekly (instead of the default daily) significantly increase the time taken for each run?
the gc always has the same amount to do, regardless of how often you do it:

1. it iterates over all indices and touches all chunks
2. it iterates over all chunks and deletes the ones that are older than the cutoff date

the only thing that can change if you let it run less often it that there are (depending on your prune settings) more chunks & indices to iterate over
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!