PBS Verification and ZFS Scrubbing and ....

arubenstein

New Member
Jul 17, 2023
28
0
1
I try to be pragmatic with how I approach solutions and make sure that I am not "over-solving problems" or "providing solutions for problems that don't exist" ...

Environment is a PVE 8.x cluster of 3 nodes, using CEPH (SSD underlying) for storage. This works perfection, large nodes (40 cores, 768gb RAM, hundreds of VM's) with lots of SSD drives makes for an excellent environment of computer with good redundancy (CEPH 3 copies).

Then, I have a PBS server running with ZFS, large rotational drives (20 TB) and then some SSD for a ZFS special device. Backups work well, very zippy. We keep 7 days, 4 weeks, and three months of retention.

My question surrounds the need for verification and on top of ZFS scrubbing. I should mention that the rotational drives are set up as a RAID-Z2.

Wikipedia, about ZFS, says (https://en.wikipedia.org/wiki/ZFS#RAID_("RAID-Z")), among other things:

"In addition to handling whole-disk failures, RAID-Z can also detect and correct silent data corruption, offering "self-healing data": when reading a RAID-Z block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor.[36]"

The question: So, if at the file system level there is that level of bit-rot and/or sector failure and/or drive level protection, is it really needed to also then perform PBS level data verification?
 
The question: So, if at the file system level there is that level of bit-rot and/or sector failure and/or drive level protection, is it really needed to also then perform PBS level data verification?
Not needed for checking the data integrity of your chunk files. But a scrub will only check if those files did not get corrupted. It can't know if a whole file is missing for some reason. There are cases where backup snapshots won't work anymore because the virus scanner quarantined some chunk files or the atime wasn't enabled and the GC deleted too much chunk files. To be protected against such cases you would still need to run verify jobs in PBS.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!