[SOLVED] Backup of a FreeBSD+ZFS VM fails verification

May 13, 2020
16
4
23
44
Hi community and PBS support.

I am experiencing a strange issue with PBS, I am obviously doing something wrong but I can't figure out what.

PVE Details:
Code:
Cluster: No, single host
Version: pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)
Storage: ZFS RAID10 - 4 HDDs + 2x mirrored Intel Optane for SLOG + 1x Samsung NVMe SSD for ARCL2
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 13:54:03 with 0 errors on Sun Apr 14 14:18:06 2024
config:

        NAME                                                 STATE     READ WRITE CKSUM
        data                                                 ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            ata-TOSHIBA_MG08ACA16TE_**             ONLINE       0     0     0
            ata-TOSHIBA_MG08ACA16TE_**             ONLINE       0     0     0
          mirror-1                                           ONLINE       0     0     0
            ata-TOSHIBA_MG08ACA16TE_**             ONLINE       0     0     0
            ata-TOSHIBA_MG08ACA16TE_**             ONLINE       0     0     0
        logs
          mirror-3                                           ONLINE       0     0     0
            nvme-INTEL_SSDPEK1A118GA_**-part1  ONLINE       0     0     0
            nvme-INTEL_SSDPEK1A118GA_**-part1  ONLINE       0     0     0
        cache
          nvme-Samsung_SSD_970_PRO_512GB_**     ONLINE       0     0     0

errors: No known data errors

Code:
root@atlas:~# zfs get all data/data-encrypted
NAME                 PROPERTY              VALUE                  SOURCE
data/data-encrypted  type                  filesystem             -
data/data-encrypted  creation              Fri Feb 10  7:27 2023  -
data/data-encrypted  used                  2.51T                  -
data/data-encrypted  available             23.4T                  -
data/data-encrypted  referenced            200K                   -
data/data-encrypted  compressratio         1.00x                  -
data/data-encrypted  mounted               yes                    -
data/data-encrypted  quota                 none                   default
data/data-encrypted  reservation           none                   default
data/data-encrypted  recordsize            128K                   default
data/data-encrypted  mountpoint            /data/data-encrypted   default
data/data-encrypted  sharenfs              off                    default
data/data-encrypted  checksum              on                     default
data/data-encrypted  compression           on                     inherited from data
data/data-encrypted  atime                 off                    inherited from data
data/data-encrypted  devices               on                     default
data/data-encrypted  exec                  on                     default
data/data-encrypted  setuid                on                     default
data/data-encrypted  readonly              off                    default
data/data-encrypted  zoned                 off                    default
data/data-encrypted  snapdir               hidden                 default
data/data-encrypted  aclmode               discard                default
data/data-encrypted  aclinherit            restricted             default
data/data-encrypted  createtxg             40383                  -
data/data-encrypted  canmount              on                     default
data/data-encrypted  xattr                 on                     default
data/data-encrypted  copies                1                      default
data/data-encrypted  version               5                      -
data/data-encrypted  utf8only              off                    -
data/data-encrypted  normalization         none                   -
data/data-encrypted  casesensitivity       sensitive              -
data/data-encrypted  vscan                 off                    default
data/data-encrypted  nbmand                off                    default
data/data-encrypted  sharesmb              off                    default
data/data-encrypted  refquota              none                   default
data/data-encrypted  refreservation        none                   default
data/data-encrypted  guid                  3980846415803464505    -
data/data-encrypted  primarycache          all                    default
data/data-encrypted  secondarycache        all                    default
data/data-encrypted  usedbysnapshots       0B                     -
data/data-encrypted  usedbydataset         200K                   -
data/data-encrypted  usedbychildren        2.51T                  -
data/data-encrypted  usedbyrefreservation  0B                     -
data/data-encrypted  logbias               latency                default
data/data-encrypted  objsetid              1174                   -
data/data-encrypted  dedup                 off                    default
data/data-encrypted  mlslabel              none                   default
data/data-encrypted  sync                  standard               default
data/data-encrypted  dnodesize             legacy                 default
data/data-encrypted  refcompressratio      1.00x                  -
data/data-encrypted  written               200K                   -
data/data-encrypted  logicalused           2.52T                  -
data/data-encrypted  logicalreferenced     69.5K                  -
data/data-encrypted  volmode               default                default
data/data-encrypted  filesystem_limit      none                   default
data/data-encrypted  snapshot_limit        none                   default
data/data-encrypted  filesystem_count      none                   default
data/data-encrypted  snapshot_count        none                   default
data/data-encrypted  snapdev               hidden                 default
data/data-encrypted  acltype               off                    default
data/data-encrypted  context               none                   default
data/data-encrypted  fscontext             none                   default
data/data-encrypted  defcontext            none                   default
data/data-encrypted  rootcontext           none                   default
data/data-encrypted  relatime              on                     default
data/data-encrypted  redundant_metadata    all                    default
data/data-encrypted  overlay               on                     default
data/data-encrypted  encryption            aes-256-gcm            -
data/data-encrypted  keylocation           prompt                 local
data/data-encrypted  keyformat             passphrase             -
data/data-encrypted  pbkdf2iters           350000                 -
data/data-encrypted  encryptionroot        data/data-encrypted    -
data/data-encrypted  keystatus             available              -
data/data-encrypted  special_small_blocks  0                      default


VM config:
Code:
agent: 0
balloon: 0
boot: order=ide2;scsi0
cores: 8
cpu: host
description: Services status%3A **OPERATIONAL**
ide2: none,media=cdrom
memory: 12288
name: eos*****
net0: virtio=F2:EE:60:FB:**:**,bridge=vmbr0,firewall=1,tag=8
net2: virtio=5A:32:7D:5F:**:**,bridge=vmbr6,firewall=1
net3: virtio=CA:33:E6:7D:**:**,bridge=vmbr1,firewall=1,mtu=1
numa: 0
ostype: l26
protection: 1
rng0: source=/dev/urandom
scsi0: local-zfs-encrypted:vm-103-disk-1,discard=on,iothread=1,size=30G
scsi1: local-zfs-encrypted:vm-103-disk-2,discard=on,iothread=1,size=8G
scsi2: local-zfs-encrypted:vm-103-disk-0,discard=on,iothread=1,size=10T
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=6e574e5e-27c4-4ede-889d-*******
sockets: 1
tags: autoupdate;encrypted
vmgenid: 45a052b2-eb22-4629-9974-***********
watchdog: model=i6300esb,action=reset

PBS Details:
Version: proxmox-backup-server 3.2.2-1 running version: 3.2.2
Storage: Single 18TB Toshiba HDD - tried with XFS, ext4 and now its on ZFS with copies=2. ZFS reports no errors, SMART reports no errors.
Code:
pool: backups
 state: ONLINE
config:

        NAME                                    STATE     READ WRITE CKSUM
        backups                                 ONLINE       0     0     0
          ata-TOSHIBA_MG09ACA18TE_**  ONLINE       0     0     0

errors: No known data errors

Code:
root@atlas:~# zfs get all backups/backups
NAME             PROPERTY              VALUE                  SOURCE
backups/backups  type                  filesystem             -
backups/backups  creation              Thu May  9 10:07 2024  -
backups/backups  used                  2.17T                  -
backups/backups  available             14.1T                  -
backups/backups  referenced            2.17T                  -
backups/backups  compressratio         1.01x                  -
backups/backups  mounted               yes                    -
backups/backups  quota                 none                   default
backups/backups  reservation           none                   default
backups/backups  recordsize            128K                   default
backups/backups  mountpoint            /srv/backups/images    local
backups/backups  sharenfs              off                    default
backups/backups  checksum              on                     default
backups/backups  compression           on                     default
backups/backups  atime                 on                     default
backups/backups  devices               on                     default
backups/backups  exec                  on                     default
backups/backups  setuid                on                     default
backups/backups  readonly              off                    default
backups/backups  zoned                 off                    default
backups/backups  snapdir               hidden                 default
backups/backups  aclmode               discard                default
backups/backups  aclinherit            restricted             default
backups/backups  createtxg             9                      -
backups/backups  canmount              on                     default
backups/backups  xattr                 on                     default
backups/backups  copies                2                      local
backups/backups  version               5                      -
backups/backups  utf8only              off                    -
backups/backups  normalization         none                   -
backups/backups  casesensitivity       sensitive              -
backups/backups  vscan                 off                    default
backups/backups  nbmand                off                    default
backups/backups  sharesmb              off                    default
backups/backups  refquota              none                   default
backups/backups  refreservation        none                   default
backups/backups  guid                  5060076912392400749    -
backups/backups  primarycache          all                    default
backups/backups  secondarycache        all                    default
backups/backups  usedbysnapshots       0B                     -
backups/backups  usedbydataset         2.17T                  -
backups/backups  usedbychildren        0B                     -
backups/backups  usedbyrefreservation  0B                     -
backups/backups  logbias               latency                default
backups/backups  objsetid              388                    -
backups/backups  dedup                 off                    default
backups/backups  mlslabel              none                   default
backups/backups  sync                  standard               default
backups/backups  dnodesize             legacy                 default
backups/backups  refcompressratio      1.01x                  -
backups/backups  written               2.17T                  -
backups/backups  logicalused           2.21T                  -
backups/backups  logicalreferenced     2.21T                  -
backups/backups  volmode               default                default
backups/backups  filesystem_limit      none                   default
backups/backups  snapshot_limit        none                   default
backups/backups  filesystem_count      none                   default
backups/backups  snapshot_count        none                   default
backups/backups  snapdev               hidden                 default
backups/backups  acltype               off                    default
backups/backups  context               none                   default
backups/backups  fscontext             none                   default
backups/backups  defcontext            none                   default
backups/backups  rootcontext           none                   default
backups/backups  relatime              on                     default
backups/backups  redundant_metadata    all                    default
backups/backups  overlay               on                     default
backups/backups  encryption            off                    default
backups/backups  keylocation           none                   default
backups/backups  keyformat             none                   default
backups/backups  pbkdf2iters           0                      default
backups/backups  special_small_blocks  0                      default

There are 7 VMs in this backup schedule however only 1 of them is failing the verification. It sits on an ZFS encrypted zvol and is FreeBSD 14.0 running ZFS itself. It has about 2TB of data and a 10TB disk allocated to it.

Errors:
Code:
2024-05-11T05:30:59+03:00: verify backups:vm/103/2024-05-10T20:01:27Z
2024-05-11T05:30:59+03:00:   check qemu-server.conf.blob
2024-05-11T05:30:59+03:00:   check fw.conf.blob
2024-05-11T05:30:59+03:00:   check drive-scsi2.img.fidx
2024-05-11T05:37:04+03:00: can't verify chunk, load failed - store 'backups', unable to load chunk '517aa60e4480771bd0560626b2834e36459c473b9d0ddfde9df3ed45de7a5eac' - Data blob has wrong CRC checksum.
2024-05-11T05:37:04+03:00: corrupted chunk renamed to "/srv/backups/images/backups/.chunks/517a/517aa60e4480771bd0560626b2834e36459c473b9d0ddfde9df3ed45de7a5eac.0.bad"
2024-05-11T09:53:19+03:00:   verified 1967791.48/2134788.00 MiB in 15739.33 seconds, speed 125.02/135.63 MiB/s (1 errors)
2024-05-11T09:53:19+03:00: verify backups:vm/103/2024-05-10T20:01:27Z/drive-scsi2.img.fidx failed: chunks could not be verified
Code:
root@atlas:~# du -sh /srv/backups/images/backups/.chunks/517a/517aa60e4480771bd0560626b2834e36459c473b9d0ddfde9df3ed45de7a5eac.0.bad
4.1M    /srv/backups/images/backups/.chunks/517a/517aa60e4480771bd0560626b2834e36459c473b9d0ddfde9df3ed45de7a5eac.0.bad
root@atlas:~# ls -la /srv/backups/images/backups/.chunks/517a/517aa60e4480771bd0560626b2834e36459c473b9d0ddfde9df3ed45de7a5eac.0.bad
-rw-r--r-- 1 backup backup 4179962 May  9 10:50 /srv/backups/images/backups/.chunks/517a/517aa60e4480771bd0560626b2834e36459c473b9d0ddfde9df3ed45de7a5eac.0.bad
root@atlas:~#

I tried destroying the backups zpool and recreating it, tried copies=2, tried with XFS and ext4, tried giving PBS more threads to work with (more workers), tried limits the BW speed to not overwhelm the disk. Nothing helped.

Please, help me find what I am missing here?
 
Last edited:
Forgot to add, here is the backup log form PVE:

Code:
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2024-05-10 23:01:27
INFO: status = running
INFO: VM Name: eos****
INFO: include disk 'scsi0' 'local-zfs-encrypted:vm-103-disk-1' 30G
INFO: include disk 'scsi1' 'local-zfs-encrypted:vm-103-disk-2' 8G
INFO: include disk 'scsi2' 'local-zfs-encrypted:vm-103-disk-0' 10T
INFO: backup mode: snapshot
INFO: ionice priority: 3
INFO: creating Proxmox Backup Server archive 'vm/103/2024-05-10T20:01:27Z'
INFO: enabling encryption
INFO: started backup task 'a808f5f1-da69-40c2-8fd6-b52ad7e42e99'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (1.9 GiB of 30.0 GiB dirty)
INFO: scsi1: dirty-bitmap status: OK (drive clean)
INFO: scsi2: dirty-bitmap status: OK (1.0 GiB of 10.0 TiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 2.9 GiB dirty of 10.0 TiB total
INFO:   6% (196.0 MiB of 2.9 GiB) in 3s, read: 65.3 MiB/s, write: 58.7 MiB/s
INFO:  19% (572.0 MiB of 2.9 GiB) in 6s, read: 125.3 MiB/s, write: 90.7 MiB/s
INFO:  30% (904.0 MiB of 2.9 GiB) in 9s, read: 110.7 MiB/s, write: 97.3 MiB/s
INFO:  41% (1.2 GiB of 2.9 GiB) in 12s, read: 117.3 MiB/s, write: 94.7 MiB/s
INFO:  46% (1.4 GiB of 2.9 GiB) in 15s, read: 44.0 MiB/s, write: 44.0 MiB/s
INFO:  53% (1.6 GiB of 2.9 GiB) in 18s, read: 70.7 MiB/s, write: 70.7 MiB/s
INFO:  60% (1.8 GiB of 2.9 GiB) in 21s, read: 65.3 MiB/s, write: 65.3 MiB/s
INFO:  67% (2.0 GiB of 2.9 GiB) in 24s, read: 74.7 MiB/s, write: 74.7 MiB/s
INFO:  75% (2.2 GiB of 2.9 GiB) in 27s, read: 80.0 MiB/s, write: 80.0 MiB/s
INFO:  84% (2.5 GiB of 2.9 GiB) in 30s, read: 85.3 MiB/s, write: 85.3 MiB/s
INFO:  89% (2.6 GiB of 2.9 GiB) in 33s, read: 53.3 MiB/s, write: 53.3 MiB/s
INFO: 100% (2.9 GiB of 2.9 GiB) in 36s, read: 105.3 MiB/s, write: 104.0 MiB/s
INFO: Waiting for server to finish backup validation...
INFO: backup is sparse: 224.00 MiB (7%) total zero data
INFO: backup was done incrementally, reused 10.03 TiB (99%)
INFO: transferred 2.92 GiB in 39 seconds (76.7 MiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 103 (00:00:41)
 
Adding a reply as I found multiple articles like this one with no definitive answer. In my case it turned out to not be a PBS problem, the system had memory issues so that was corrupting the backup only when the bitmap was clean (e.g PVE wasn't rebooted recently) and the bitmap of this VM was stored in the defective memory. I have ordered ECC memory and will be replacing it with that. Hopefully those sporadic backup corruptions disappear.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!