Sorry for not answering before, I had a bad time trying to restore a very critical and huge SQL VM (more than 6 TB) from file restoration (blessed BURP backup and my paranoia in not relying on just one backup solution) and I completely forgot about this until now, when I decided to give PBS another try. I can't remember the exact error and I also re-created everything, so now I cannot specify the error, but the message was like a checksum mismatch. All backup, except CT backups, were corrupt. They didn't complain when creating, just when restoring or browsing them, which is the worst case scenario, as we don't really know since when they were failing, but I assume (and I hope) they were failing since the PBS 1 to 2 upgrade.
In my scenario, I was using ZFS, and it was created from the PBS GUI itself. ZFS status reported no error at all, I was able to create and work with other file formats on that pool just fine. I rebooted the server, which had 260 days of uptime, created backups manually… nothing worked until I deleted all the former backups of a VM and then started a new backup. Same storage, same connectivity, everything the same, just no old backups. I deleted then all the backups from all VMs and backups were browsable and restoring once again.
I must say I just have one backup per VM, so rotation should be quite easy, and the fact it failed on all VMs (but on none of the CTs) made me think perhaps it was something about ZFS and how it handled qcow2 files from origin (just guessing, probably you devs will have a better idea).
That's why I've destroyed the pool and created a mdadm software raid instead and mounted a directory as datastore. I'm doing right now a full backup for all my VMs. I think mdadm can outperform raidz because it must have more IOPS than a raidz (preliminary test with small VMs were promising). On my last scenario, I had to disable verification on the datastore because of two huge VMs which take more than a week each to verify, so I wasn't able to do a backup again until verification finished… lesson learned, we must activate verification and calculate the backup to be executed after it. (wishlist to PBS devs: make verification activatable per VM, not just per datastore, so we can handpick which one VM we want to avoid the verification because it takes an eternity).
All in all, this has been a very painful experience, I was very lucky to have another backup mechanism, although I had to install a VM from scratch, all the software and then restore the MSSQL huge DBs… If I only got noticed backups were in a bad state I could have done something preventively, but as verification takes an eternity and is an all-or-nothing, I decided for the “nothing” after testing PBS at the beginning and seeing it was working just as expected. Now I rather prefer to have an outdated (and verified) VM backup from two weeks ago rather than repeat this hell of restoring. I hope mdadm will be faster than ZFS, yet I know I will lose all the benefits of that file system, I just need space and speed, and ZFS (with my hardware) seems to be quite slow as I got the IOPS of a single HDD.
Well, after months of my last comment, when I'd reinstalled PBS above a MDADM instead a RAIDZ pool. I've just faced once again the same issue.
This time, though, I made sure I had verification passed, but it failed once again.
I think I'm giving up with PBS, or I have very bad luck, or it is far from be production ready, and when you need to restore a copy the most, you face this, leaving you in the deepest misery.
Former backup system, where you only need to specify a destination, is quite more mature. I prefet to not to have deduplication and other features rather than find, when I need desperately to restore a VM, that I cannot do it.