I like ZFS for it's simplicity in growing, the possibility of fast disks in front of a slow disk, the filesystem per user, including quota, compression (although not very useful in this case indeed).
i agree that zfs is very nice, thats the reason we recommend using it, but with local hardware
Also, the way PBS is built, I don't think anyone will be happy of you have a lot of users/backups with all those files in a ext4 filesystem.
why not?
Even though Ceph does do checksumming, it does not checksum the filesystem. ZFS does checksumming, and still PBS runs verify's, to check the checksummed chunks.
i do not see the point in having checksums on filesystem level. you now have 3 checksums (that all have to be calculated)
block (ceph)
fs (zfs)
file (pbs)
zfs checksumming is only advantageous if you have a zpool where the files with wrong checksums can be healed by the redundancy
otherwise the pbs file checksums are enough to detect something like bitrot and cephs checksums take care of keeping it consistent
So ZFS is very easy to run with a lot of different datastores/users, that is the main reason. It also scales better than ext4/xfs.
in such a case i'd use lvm + 1 lv per datastore + ext4 (maybe xfs?)
i did throw a mini benchmark together:
"toy" ceph cluster with 4 virtual nodes (1 osd per node on nvme)
(it will not be fast but the relative difference is interesting)
pbs is a vm with 4 cores, 16 gb ram
1st datastore is zfs on a ceph disk without compression
2st datastore is plain ext4 on a ceph disk with no further tuning
i backed up a random vm with a 30gb disk and a ~28GiB directory (like a container)
| ~30GiB VM | ~28GiB Directory |
ZFS on Ceph | ~60MiB/s | ~30MiB/s |
Ext4 on Ceph | ~220MiB/s | ~120MiB/s |
so it seems that in such a setup ext4 is much faster than zfs (although the absolute values are irrelevant, the relative difference is important)
my general point is that using a feature-rich storage/fs such as ceph/zfs/qcow2/etc. has a performance penalty and they
should not be stacked (e.g. zfs on qcow2 (or reverse) is also *very* slow) especially if one has already all the features of the other