Garbage collection, mark used chunks took very long time at a certain percentage

vmware · Apr 27, 2023

On my PBS 2.4 server, the GC process took very long time (almost 2 days) to mark used chunks through 14 - 23%.

Code:

2023-04-09T14:25:38: starting garbage collection on store store1
2023-04-09T14:25:38: Start GC phase1 (mark used chunks)
2023-04-09T14:30:45: marked 1% (30 of 2983 index files)
2023-04-09T14:33:12: marked 2% (60 of 2983 index files)
2023-04-09T14:35:05: marked 3% (90 of 2983 index files)
2023-04-09T14:36:33: marked 4% (120 of 2983 index files)
2023-04-09T14:37:51: marked 5% (150 of 2983 index files)
2023-04-09T14:41:01: marked 6% (179 of 2983 index files)
2023-04-09T14:43:53: marked 7% (209 of 2983 index files)
2023-04-09T14:48:31: marked 8% (239 of 2983 index files)
2023-04-09T14:50:37: marked 9% (269 of 2983 index files)
2023-04-09T14:54:03: marked 10% (299 of 2983 index files)
2023-04-09T14:55:44: marked 11% (329 of 2983 index files)
2023-04-09T15:00:08: marked 12% (358 of 2983 index files)
2023-04-09T15:06:29: marked 13% (388 of 2983 index files)
2023-04-09T15:06:44: marked 14% (418 of 2983 index files)
2023-04-09T16:04:58: marked 15% (448 of 2983 index files)
2023-04-09T21:22:14: marked 16% (478 of 2983 index files)
2023-04-10T02:35:50: marked 17% (508 of 2983 index files)
2023-04-10T10:42:11: marked 18% (537 of 2983 index files)
2023-04-10T16:03:02: marked 19% (567 of 2983 index files)
2023-04-10T20:57:47: marked 20% (597 of 2983 index files)
2023-04-11T02:54:07: marked 21% (627 of 2983 index files)
2023-04-11T08:23:09: marked 22% (657 of 2983 index files)
2023-04-11T09:02:43: marked 23% (687 of 2983 index files)
2023-04-11T09:03:14: marked 24% (716 of 2983 index files)
2023-04-11T09:03:45: marked 25% (746 of 2983 index files)
...
...
2023-04-11T09:40:38: TASK OK

On disk data is about 700GB only.

The same datastore is sync'ed to another PBS server, which doesn't have this issue at all. (Took 11 minutes to GC)

dcsapak · Apr 27, 2023

what is the underlying storage like? hdds or ssds?
was there anything in the log during the gc ?

vmware · Apr 27, 2023

Underlying storage is a 3TB USB HDD, ext4. I've beeing using this datastore for 1-2 years now and the slow down happened recently.

Which logs should I check while GC is running? /var/log/messages?

dcsapak · Apr 27, 2023

better check /var/log/syslog or the journal (with journalctl)

vmware said:
Underlying storage is a 3TB USB HDD, ext4. I've beeing using this datastore for 1-2 years now and the slow down happened recently.

it can happen that hdds break and e.g. some blocks can't be accessed or take forever to access
check also the smart values if the usb controller allows this (should then even be possible via the webui)

vmware · Apr 27, 2023

Strangely, I started a GC run from the web UI, and it took 25 minutes to complete. Nothing in the logs.

¯\_(ツ)_/¯

vmware · Apr 27, 2023

Would running GC weekly (instead of the default daily) significantly increase the time taken for each run?

IIRC, I changed the GC to run weekly after first noticing GC took over 24 hours to complete.

dcsapak · Apr 27, 2023

vmware said:
Would running GC weekly (instead of the default daily) significantly increase the time taken for each run?

the gc always has the same amount to do, regardless of how often you do it:

1. it iterates over all indices and touches all chunks
2. it iterates over all chunks and deletes the ones that are older than the cutoff date

the only thing that can change if you let it run less often it that there are (depending on your prune settings) more chunks & indices to iterate over

Search

Search

Garbage collection, mark used chunks took very long time at a certain percentage

vmware

Member

dcsapak

Proxmox Staff Member

vmware

Member

dcsapak

Proxmox Staff Member

vmware

Member

vmware

Member

dcsapak

Proxmox Staff Member