[SOLVED] Is verify task needed with ZFS-backed datastore ?

Altinea · Feb 11, 2021

Hello,
On PBS, when using a ZFS datastore, should we really enable verification job ?

AFAIK, when there's some kind of redundancy (multiple copies, RAIDZ or mirroring), checksum not disabled (why would you do this ?) AND ZFS scrubbing is done on a regular basis (once a week, a month ?), ZFS is able to detect and recover from bit rot by itself.
With no redundancy (you like risks huh ?), bit rot could be detected but not corrected.

So the verify job seems to be no-use is this particular case, right ?

This is probably the same thing with other checksumming filesystems (xfs, btrfs ?).

I forgot to mention : don't delete random chunks in your datastore, just let the GC do the job. But again, who do this ?

Thanks for your advices about this idea.

t.lamprecht · Feb 11, 2021

Hi,

yeah, if the storage level already has enough redundancy and does complete and full checks against bitrot (not only metadata but everything) then Snapshot verification on the PBS level may be an overkill.

Altinea said:
ZFS scrubbing is done on a regular basis (once a week, a month ?),

ZFS scrubbing is done once a month by default, so a higher frequency like per-week may be more ideal in such a case.

Altinea said:
This is probably the same thing with other checksumming filesystems (xfs, btrfs ?).

XFS only checksums metadata, not all data and such not equal to verify.

In general IMO one may get a better safety net by:
* having an off-site archive mirror of the relevant PBS snapshots
* doing frequent end to end restore tests to ensure all relevant data and disks is actually backed up and the restore process works and is familiar, which can be really good for outages, as one can get more easily stressed then.

t.lamprecht · Feb 11, 2021

Altinea said:
I forgot to mention : don't delete random chunks in your datastore, just let the GC do the job. But again, who do this ?

Nobody in their right mind would do this with intend, but a mistype command or multiple open SSH connections and typing something in the wrong console can happen to any (maybe tired) "meatbag" - you only need to have bad luck once.

But yeah, verification won't really help against that either, but off-site copy will.

Altinea · Feb 11, 2021

t.lamprecht said:
Hi,

yeah, if the storage level already has enough redundancy and does complete and full checks against bitrot (not only metadata but everything) then Snapshot verification on the PBS level may be an overkill.

ZFS scrubbing is done once a month by default, so a higher frequency like per-week may be more ideal in such a case.

XFS only checksums metadata, not all data and such not equal to verify.

In general IMO one may get a better safety net by:
* having an off-site archive mirror of the relevant PBS snapshots
* doing frequent end to end restore tests to ensure all relevant data and disks is actually backed up and the restore process works and is familiar, which can be really good for outages, as one can get more easily stressed then.

Thanks for thoses advices.

That's another subject but restorations can't be done on another VMID so testing them is a bit painful (aka complicated). Perhaps in a future version ?

Best regards

t.lamprecht · Feb 12, 2021

Restoration can be done just fine do any VMID since pretty much always.

Just start the restore from the storage content panel, not the VMs/CTs Backup panel and you can select a new VMID.

You may want to disable the network links of such test restored guest, not that they come up and mess with production.

Altinea · Feb 12, 2021

t.lamprecht said:
Restoration can be done just fine do any VMID since pretty much always.

Just start the restore from the storage content panel, not the VMs/CTs Backup panel and you can select a new VMID.

You may want to disable the network links of such test restored guest, not that they come up and mess with production.

Shame on me, you're absolutely right. I only tested from the VM backup panel, didn't know about this limitation.

I don't think it's even needed to start the VM to test restoration but that's better, yeah.

This thread can now be closed, thanks a gain for sharing your toughts.

hvisage · Sep 12, 2023

t.lamprecht said:
Nobody in their right mind would do this with intend, but a mistype command or multiple open SSH connections and typing something in the wrong console can happen to any (maybe tired) "meatbag" - you only need to have bad luck once.

But yeah, verification won't really help against that either, but off-site copy will.

The case where I used that verification, was when I needed to bootstrap a remote ('cross atlantic) PBS, and the single connection synchronizations was just.... plain molasses on a freezinglingly cold winters day...

The easiest was to spin up multiple rsync sessions of each of the directories of datatores, and then used the verification process to find the backups who's index files was transfered, but the relevant blocks already removed on the source, and then after those invalid backups was cleaned out, I have a decent backup to continue with synchronizing, pruning and GC ... the single connection synchronization would've taken about 10x times longer (LAtency links, and the CPUs that's old and slow on SSL the contributors to the bad performance)

Search

Search

[SOLVED] Is verify task needed with ZFS-backed datastore ?

Altinea

Active Member

t.lamprecht

Proxmox Staff Member

t.lamprecht

Proxmox Staff Member

Altinea

Active Member

t.lamprecht

Proxmox Staff Member

Altinea

Active Member

hvisage

Renowned Member

We value your privacy