Garbage collection phase1 does not mark (m)any chunks

pallingerpeter

New Member
Jan 17, 2024
9
0
1
Currently, garbage collection tries to delete most of my chunks. I have about 4M chunks, at list half of which should be used, and the GC log goes like this:

Code:
2024-09-27T17:01:19+02:00: starting garbage collection on store backup-cephfs
2024-09-27T17:01:19+02:00: Start GC phase1 (mark used chunks)
2024-09-27T17:09:29+02:00: Start GC phase2 (sweep unused chunks)
2024-09-27T17:10:08+02:00: processed 1% (46618 chunks)
At this point I stopped it manually, then restored the chunks from the synchronised server. 40k*100 ~ 4M, so it was going to delete most of my chunks. It did so twice before, and had to manually rsync chunks back from the replica server.

As far as I understand, GC phase1 should update the access times for referenced chunks so phase2 can delete the chunks with old (>24h+5min) access times.
The problem is, no files access times are updated.
As far as I can tell, the number of chunks with fresh access times does not change during phase1.

The chunks are on a cephfs mount. The *.fidx files are on a different (ext4-on-ceph-rbd) mount. I understand that cephfs has an outstanding feature request https://tracker.ceph.com/issues/43337 for correctly handling atime, and I successfully confirmed that reading a file does not update its atime. However, manually updating chunk times still works:

Code:
# F=.chunks/0000/0000......... ; ls -lu $F ; cat $F >/dev/null; ls -lu $F ; touch -a $F ; ls -lu $F
-rw-r--r-- 2 backup backup 2605720 Sep 11 09:10 .chunks/0000/0000....
-rw-r--r-- 2 backup backup 2605720 Sep 11 09:10 .chunks/0000/0000....
-rw-r--r-- 2 backup backup 2605720 Sep 28 13:52 .chunks/0000/0000....

I looked into the code and it seems it uses the utimensatlibc call, so the cephfs atime bug should not be a concern (however, I do not really know rust, so maybe I interpreted something wrong).

My main question is: how can I effectively debug the gc procedure to know what the problem is.
- Maybe it does not scan most images (phase1 time is certainly fast, but it is possible to read the ~2GB of image files in roughly 9 minutes)
- Can you make it print what image's chunks are being scanned?
- Maybe it can not/ does not try to update atimes
- Can you make it print which chunks' atimes are being set?

Thanks in advance!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!