Understanding PBS garbage notification

tinus_p · Dec 11, 2024

Chris said:
This only tells that the atime is not persisted to the server side instantaneously, but cached by the client for improved performance, in contrast to mtime and ctime. It does not state that the atime is never persisted on the server side.

While not recommended, it is perfectly fine to use NFS for backing your datastore, as long as you can live with the known, possible limitations, especially with respect to performance. AFAIK there are a lot of setups working just fine with NFS within the given constraints.

The way I read the documentation is that the client does not set atime, the server does, as a result of client actions seen by the server. Then when the client avoids performing the actions (reading things) because its cache already has all the information, the server never sees any action so it does not update the atime.

Frankly I think the proper solution is to ‘touch’ the chunk files as they are referenced during backups and as they are seen in the phase one sweep and then use the mtime to decide if they are garbage, because the mtime is generally considered reliable and atime is generally considered unreliable. But while I might be able to whip up a storage plugin that mounts the filesystem in an alternative way (something like rsync, rclone, sftp) if these are pluggable, changing the way backups and garbage collection works seems a bit too much of a deep dive.

Chris · Dec 11, 2024

Ah I guess you are referring to:

Code:

       The Linux client handles atime updates more loosely, however.
       NFS clients maintain good performance by caching data, but that
       means that application reads, which normally update atime, are
       not reflected to the server where a file's atime is actually
       maintained.

Well, that is apparently true, but the atime will be cached by the client, so okay for garbage collection, as updating the chunks atime in phase 1 will update the atime client side, which is reflected in phase 2 by reading the cached value. Concurrent access to the datastore on the server side however must be avoided.

fabian · Dec 11, 2024

we also don't just read to update the atime, we explicitly touch the chunks, which (on normal file system implementations) even bypasses relatime and always sets the timestamp. if a file system ignores an explicit request to update a file's timestamps without throwing an error, it's pretty broken..

n8ur · Dec 11, 2024

Well, I made two changes yesterday: (a) (unintentionally) updated to PBS 3.3 as part of an apt update; and (b) created a new datastore on a local disk attached to the VM and changed the backup job to use that. This morning I had garbage collection messages from both the old datastore and the new one, and both show "normal" results:

NFS datastore, not touched by backup job in 12 hours:
Datastore: proxmox-backup
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000001:67591C50:garbage_collection

roxmox\x2dbackup:root@pam:
Index file count: 980
Removed garbage: 900.478 MiB
Removed chunks: 856
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 33.04 GiB (in 24733 chunks)
Original Data usage: 2.141 TiB
On-Disk usage: 3.022 GiB (0.14%)
On-Disk chunks: 3340
Deduplication Factor: 725.60
Garbage collection successful.

And report from the new datastore that is actively in use on the same schedule as the other:
Datastore: zfs-pool
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000000:67591C50:garbage_collection:zfs\x2dpool:root@pam:
Index file count: 108
Removed garbage: 0 B
Removed chunks: 0
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 0 B (in 0 chunks)
Original Data usage: 234.663 GiB
On-Disk usage: 12.238 GiB (5.22%)
On-Disk chunks: 8962
Deduplication Factor: 19.18
Garbage collection successful.

In any case, with the new attached datastore things seem to be working, so I'll use that going forward.

Thanks for all the input!

Chris · Dec 12, 2024

n8ur said:
Well, I made two changes yesterday: (a) (unintentionally) updated to PBS 3.3 as part of an apt update; and (b) created a new datastore on a local disk attached to the VM and changed the backup job to use that. This morning I had garbage collection messages from both the old datastore and the new one, and both show "normal" results:

NFS datastore, not touched by backup job in 12 hours:
Datastore: proxmox-backup
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000001:67591C50:garbage_collectionroxmox\x2dbackup:root@pam:
Index file count: 980
Removed garbage: 900.478 MiB
Removed chunks: 856
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 33.04 GiB (in 24733 chunks)
Original Data usage: 2.141 TiB
On-Disk usage: 3.022 GiB (0.14%)
On-Disk chunks: 3340
Deduplication Factor: 725.60
Garbage collection successful.

And report from the new datastore that is actively in use on the same schedule as the other:
Datastore: zfs-pool
Task ID: UPID:fluffles-pbs:0000036A:000003D8:00000000:67591C50:garbage_collection:zfs\x2dpool:root@pam:
Index file count: 108
Removed garbage: 0 B
Removed chunks: 0
Removed bad chunks: 0
Leftover bad chunks: 0
Pending removals: 0 B (in 0 chunks)
Original Data usage: 234.663 GiB
On-Disk usage: 12.238 GiB (5.22%)
On-Disk chunks: 8962
Deduplication Factor: 19.18
Garbage collection successful.

In any case, with the new attached datastore things seem to be working, so I'll use that going forward.

Thanks for all the input!

Glad it works for you now, to bad we could not identify the underlying cause of the previous garbage collection reports.

tinus_p · Dec 17, 2024

I replaced the Ubuntu kernel NFS server with nfs-ganesha, which is a lot slower but now it does work. I have tested running the GC job and then the full verification and there are no more errors.

So the problem is in the NFS server Ubuntu 24 uses as the default, the Kernel NFS server.

Search

Search

Understanding PBS garbage notification

tinus_p

New Member

Chris

Proxmox Staff Member

fabian

Proxmox Staff Member

n8ur

New Member

Chris

Proxmox Staff Member

tinus_p

New Member