Understanding PBS garbage notification

This only tells that the atime is not persisted to the server side instantaneously, but cached by the client for improved performance, in contrast to mtime and ctime. It does not state that the atime is never persisted on the server side.


While not recommended, it is perfectly fine to use NFS for backing your datastore, as long as you can live with the known, possible limitations, especially with respect to performance. AFAIK there are a lot of setups working just fine with NFS within the given constraints.
The way I read the documentation is that the client does not set atime, the server does, as a result of client actions seen by the server. Then when the client avoids performing the actions (reading things) because its cache already has all the information, the server never sees any action so it does not update the atime.

Frankly I think the proper solution is to ‘touch’ the chunk files as they are referenced during backups and as they are seen in the phase one sweep and then use the mtime to decide if they are garbage, because the mtime is generally considered reliable and atime is generally considered unreliable. But while I might be able to whip up a storage plugin that mounts the filesystem in an alternative way (something like rsync, rclone, sftp) if these are pluggable, changing the way backups and garbage collection works seems a bit too much of a deep dive.
 
Ah I guess you are referring to:
Code:
       The Linux client handles atime updates more loosely, however.
       NFS clients maintain good performance by caching data, but that
       means that application reads, which normally update atime, are
       not reflected to the server where a file's atime is actually
       maintained.

Well, that is apparently true, but the atime will be cached by the client, so okay for garbage collection, as updating the chunks atime in phase 1 will update the atime client side, which is reflected in phase 2 by reading the cached value. Concurrent access to the datastore on the server side however must be avoided.
 
we also don't just read to update the atime, we explicitly touch the chunks, which (on normal file system implementations) even bypasses relatime and always sets the timestamp. if a file system ignores an explicit request to update a file's timestamps without throwing an error, it's pretty broken..
 
Well, I made two changes yesterday: (a) (unintentionally) updated to PBS 3.3 as part of an apt update; and (b) created a new datastore on a local disk attached to the VM and changed the backup job to use that. This morning I had garbage collection messages from both the old datastore and the new one, and both show "normal" results:

NFS datastore, not touched by backup job in 12 hours:
Datastore: proxmox-backup
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000001:67591C50:garbage_collection:proxmox\x2dbackup:root@pam:
Index file count: 980
Removed garbage: 900.478 MiB
Removed chunks: 856
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 33.04 GiB (in 24733 chunks)
Original Data usage: 2.141 TiB
On-Disk usage: 3.022 GiB (0.14%)
On-Disk chunks: 3340
Deduplication Factor: 725.60
Garbage collection successful.

And report from the new datastore that is actively in use on the same schedule as the other:
Datastore: zfs-pool
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000000:67591C50:garbage_collection:zfs\x2dpool:root@pam:
Index file count: 108
Removed garbage: 0 B
Removed chunks: 0
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 0 B (in 0 chunks)
Original Data usage: 234.663 GiB
On-Disk usage: 12.238 GiB (5.22%)
On-Disk chunks: 8962
Deduplication Factor: 19.18
Garbage collection successful.

In any case, with the new attached datastore things seem to be working, so I'll use that going forward.

Thanks for all the input!
 
Well, I made two changes yesterday: (a) (unintentionally) updated to PBS 3.3 as part of an apt update; and (b) created a new datastore on a local disk attached to the VM and changed the backup job to use that. This morning I had garbage collection messages from both the old datastore and the new one, and both show "normal" results:

NFS datastore, not touched by backup job in 12 hours:
Datastore: proxmox-backup
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000001:67591C50:garbage_collection:proxmox\x2dbackup:root@pam:
Index file count: 980
Removed garbage: 900.478 MiB
Removed chunks: 856
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 33.04 GiB (in 24733 chunks)
Original Data usage: 2.141 TiB
On-Disk usage: 3.022 GiB (0.14%)
On-Disk chunks: 3340
Deduplication Factor: 725.60
Garbage collection successful.

And report from the new datastore that is actively in use on the same schedule as the other:
Datastore: zfs-pool
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000000:67591C50:garbage_collection:zfs\x2dpool:root@pam:
Index file count: 108
Removed garbage: 0 B
Removed chunks: 0
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 0 B (in 0 chunks)
Original Data usage: 234.663 GiB
On-Disk usage: 12.238 GiB (5.22%)
On-Disk chunks: 8962
Deduplication Factor: 19.18
Garbage collection successful.

In any case, with the new attached datastore things seem to be working, so I'll use that going forward.

Thanks for all the input!
Glad it works for you now, to bad we could not identify the underlying cause of the previous garbage collection reports.
 
I replaced the Ubuntu kernel NFS server with nfs-ganesha, which is a lot slower but now it does work. I have tested running the GC job and then the full verification and there are no more errors.

So the problem is in the NFS server Ubuntu 24 uses as the default, the Kernel NFS server.
 
I'm revisiting this because I'm still a bit confused. I recreated my PBS in a Synology VM with an attached volume for storage. The storage pool is a zfs pool with atime enabled. I backup two clusters, once twice daily and the other one once daily (at staggered times). The email I get from garbage collection shows a high deduplication factor, but still never shows any garbage removal. Here is the most recent report:
Datastore: zfs-pool
Task ID: UPID:fluffles-pbs:0000036A:000003D8:0000056F:677F57D0:garbage_collection:zfs\x2dpool:root@pam:
Index file count: 2609
Removed garbage: 0 B
Removed chunks: 0
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 0 B (in 0 chunks)
Original Data usage: 5.704 TiB
On-Disk usage: 55.174 GiB (0.94%)
On-Disk chunks: 46349
Deduplication Factor: 105.87

Here's the output of zpool list and zfs get atime:
oot@fluffles-pbs:/pbs-backups# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
pbs-backups 254G 56.3G 198G - - 0% 22% 1.00x ONLINE -
root@fluffles-pbs:/pbs-backups# zfs get atime pbs-backups
NAME PROPERTY VALUE SOURCE

Here is the output of checking atime on the .chunks:
root@fluffles-pbs:/pbs-backups# find .chunks -type f -amin -3600 | wc -l
47061
root@fluffles-pbs:/pbs-backups# find .chunks -type f -amin +3600 | wc -l
0

Here is the output of cat /run/proxmox-backup/active-operations/zfs-pool
[{"pid":874,"starttime":984,"active_operations":{"read":0,"write":0}}]

So I'm stumped why even with local zfs storage I never see any garbage collection. Any further ideas?

Thanks!
John
 
garbage only exists if you prune backup snapshots.. the deduplication happens before the chunks hit the disk.
 
  • Like
Reactions: n8ur
Fabian, THANK YOU for clarifying that! From my reading, I thought garbage collection was a daily process that was related to deduplication. If it's related to pruning, everything makes sense now. Thanks for all your help!