I try to be pragmatic with how I approach solutions and make sure that I am not "over-solving problems" or "providing solutions for problems that don't exist" ...
Environment is a PVE 8.x cluster of 3 nodes, using CEPH (SSD underlying) for storage. This works perfection, large nodes (40 cores, 768gb RAM, hundreds of VM's) with lots of SSD drives makes for an excellent environment of computer with good redundancy (CEPH 3 copies).
Then, I have a PBS server running with ZFS, large rotational drives (20 TB) and then some SSD for a ZFS special device. Backups work well, very zippy. We keep 7 days, 4 weeks, and three months of retention.
My question surrounds the need for verification and on top of ZFS scrubbing. I should mention that the rotational drives are set up as a RAID-Z2.
Wikipedia, about ZFS, says (https://en.wikipedia.org/wiki/ZFS#RAID_("RAID-Z")), among other things:
"In addition to handling whole-disk failures, RAID-Z can also detect and correct silent data corruption, offering "self-healing data": when reading a RAID-Z block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor.[36]"
The question: So, if at the file system level there is that level of bit-rot and/or sector failure and/or drive level protection, is it really needed to also then perform PBS level data verification?
Environment is a PVE 8.x cluster of 3 nodes, using CEPH (SSD underlying) for storage. This works perfection, large nodes (40 cores, 768gb RAM, hundreds of VM's) with lots of SSD drives makes for an excellent environment of computer with good redundancy (CEPH 3 copies).
Then, I have a PBS server running with ZFS, large rotational drives (20 TB) and then some SSD for a ZFS special device. Backups work well, very zippy. We keep 7 days, 4 weeks, and three months of retention.
My question surrounds the need for verification and on top of ZFS scrubbing. I should mention that the rotational drives are set up as a RAID-Z2.
Wikipedia, about ZFS, says (https://en.wikipedia.org/wiki/ZFS#RAID_("RAID-Z")), among other things:
"In addition to handling whole-disk failures, RAID-Z can also detect and correct silent data corruption, offering "self-healing data": when reading a RAID-Z block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor.[36]"
The question: So, if at the file system level there is that level of bit-rot and/or sector failure and/or drive level protection, is it really needed to also then perform PBS level data verification?