PBS PVE Fail Backup

Hi Fabian,

yes, could say so, that those two VMs are especially active. One is Zabbix so data is continuously processed.
The only other difference might be the size: those are the two largest VMs i have (600GB and 50GB of harddisks).
Anyway, i waited 24h, ran the prune on the datastore, manually deleted the folders vms/[ID] with only the owner file inside and tried the backup again which, this time, completed successfully. Nothing changed from yesterday and I didn't notice anything with the ISP (that might have generated issue yesterday)...

"Tutto è bene ciò che finisce bene" we say in Italy... bye!
 
Hi,

Don't see any followup to this - but we're having precisely the same problem.

This is a brand new PBS installation - all updates are done - and the zpool layout is as follows:

Code:
root@pbs-01:~# zpool status
  pool: PBS01-Datastore
 state: ONLINE
  scan: scrub repaired 0B in 00:17:24 with 0 errors on Tue Feb 22 19:40:51 2022
config:

        NAME             STATE     READ WRITE CKSUM
        PBS01-Datastore  ONLINE       0     0     0
          mirror-0       ONLINE       0     0     0
            sdu          ONLINE       0     0     0
            sdv          ONLINE       0     0     0
            sdw          ONLINE       0     0     0
          mirror-1       ONLINE       0     0     0
            sdaa         ONLINE       0     0     0
            sdab         ONLINE       0     0     0
            sdac         ONLINE       0     0     0
          mirror-2       ONLINE       0     0     0
            sdad         ONLINE       0     0     0
            sdae         ONLINE       0     0     0
            sdaf         ONLINE       0     0     0
          mirror-3       ONLINE       0     0     0
            sdag         ONLINE       0     0     0
            sdah         ONLINE       0     0     0
            sdai         ONLINE       0     0     0
          mirror-4       ONLINE       0     0     0
            sdaj         ONLINE       0     0     0
            sdak         ONLINE       0     0     0
            sdal         ONLINE       0     0     0
          mirror-5       ONLINE       0     0     0
            sdi          ONLINE       0     0     0
            sdj          ONLINE       0     0     0
            sdk          ONLINE       0     0     0
          mirror-6       ONLINE       0     0     0
            sdl          ONLINE       0     0     0
            sdm          ONLINE       0     0     0
            sdn          ONLINE       0     0     0
          mirror-7       ONLINE       0     0     0
            sdo          ONLINE       0     0     0
            sdp          ONLINE       0     0     0
            sdq          ONLINE       0     0     0
          mirror-8       ONLINE       0     0     0
            sdr          ONLINE       0     0     0
            sds          ONLINE       0     0     0
            sdt          ONLINE       0     0     0
        special
          mirror-9       ONLINE       0     0     0
            sdd          ONLINE       0     0     0
            sde          ONLINE       0     0     0
            sdf          ONLINE       0     0     0
        logs
          mirror-10      ONLINE       0     0     0
            sdg          ONLINE       0     0     0
            sdh          ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:13 with 0 errors on Tue Feb 22 19:23:31 2022
config:

        NAME                                                     STATE     READ WRITE CKSUM
        rpool                                                    ONLINE       0     0     0
          mirror-0                                               ONLINE       0     0     0
            ata-SAMSUNG_MZ7KM960HAHP-00005_S2HTNX0J400741-part3  ONLINE       0     0     0
            ata-SAMSUNG_MZ7KM960HAHP-00005_S2HTNX0J400728-part3  ONLINE       0     0     0
            ata-SAMSUNG_MZ7KM960HAHP-00005_S2HTNX0H413627-part3  ONLINE       0     0     0

errors: No known data errors

The 'rpool' contains the OS (3-way SSD RAIDZ Mirror), and the PBS01-Datastore contains the backup datastore - constructed of 3-way HDD RAIDZ mirrors and SSD-based mirrored SPECIAL device and a mirrored SLOG/ZIL device (and in a 44 unit Supermicro JBOD - HGST12TB drives).

The PBS 'compute' unit is a 2x Intel Gold 6150 Supermicro Ultra with 384GB RAM.

So on paper, we should have no performance issues - and these errors we are getting are purely during test, not production usage.

We are seeing this (as an example) in our logs and we see failing backups:

Code:
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: starting new backup on datastore 'PBS01-Datastore': "vm/1869/2022-02-23T15:11:38Z"
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: GET /previous: 400 Bad Request: no valid previous backup
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: created new fixed index 1 ("vm/1869/2022-02-23T15:11:38Z/drive-scsi0.img.fidx")
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: created new fixed index 2 ("vm/1869/2022-02-23T15:11:38Z/drive-scsi1.img.fidx")
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: add blob "/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z/qemu-server.conf.blob" (358 bytes, comp: 358)
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: add blob "/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z/fw.conf.blob" (138 bytes, comp: 138)
Feb 23 17:14:11 pbs-01 systemd[1]: session-1.scope: Succeeded.
Feb 23 17:14:11 pbs-01 systemd[1]: session-3.scope: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopping User Manager for UID 0...
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Main User Target.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Basic System.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Paths.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Sockets.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Timers.
Feb 23 17:14:21 pbs-01 systemd[6105]: dirmngr.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG network certificate management daemon.
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent-browser.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent-extra.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent-ssh.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent and passphrase cache.
Feb 23 17:14:21 pbs-01 systemd[6105]: Removed slice User Application Slice.
Feb 23 17:14:21 pbs-01 systemd[6105]: Reached target Shutdown.
Feb 23 17:14:21 pbs-01 systemd[6105]: systemd-exit.service: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Finished Exit the Session.
Feb 23 17:14:21 pbs-01 systemd[6105]: Reached target Exit the Session.
Feb 23 17:14:21 pbs-01 systemd[1]: user@0.service: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopped User Manager for UID 0.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopping User Runtime Directory /run/user/0...
Feb 23 17:14:21 pbs-01 systemd[1]: run-user-0.mount: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: user-runtime-dir@0.service: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopped User Runtime Directory /run/user/0.
Feb 23 17:14:21 pbs-01 systemd[1]: Removed slice User Slice of UID 0.
Feb 23 17:14:21 pbs-01 systemd[1]: user-0.slice: Consumed 1min 41.746s CPU time.
Feb 23 17:14:56 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z /index.json.blob"' - No such file or directory (os error 2)'
Feb 23 17:14:56 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1403/2022-02-23T14:17:09Z /index.json.blob"' - No such file or directory (os error 2)'
Feb 23 17:15:01 pbs-01 CRON[1363317]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Feb 23 17:16:00 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z /index.json.blob"' - No such file or directory (os error 2)'
Feb 23 17:16:00 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1403/2022-02-23T14:17:09Z /index.json.blob"' - No such file or directory (os error 2)'

So one of the backups started fine - large 6TB VM - and progressed fine until about 01:00 this morning after which things 'broke' - no further syslog or kern.log entries and all the while both the PVE host and the PBS server indicating that the backup was still ongoing - but it simply did not progress and hung up.

We're looking at whether this could be linked to the SPECIAL and/or SLOG/ZIL device/s (with the SLOG/ZIL device looking like it actually is not utilised/serves no prupose in the PBS setup).

Anything else we should be looking at/any ideas?

Kind regards,

Angelo.
 
Last edited:
well you defnitely need more logs closer to when the problem appeared. if the log files on disk don't contain the relevant time span it's possible you triggered some kernel bug that prevented writing to disk - in that case serial or netconsole might help collecting the relevant info.
 
Good day

There are three VMs, 2 win os and 1 linux

This is the error I get when trying to create a backup of one of the machines on win os.

Thanks



INFO: starting new backup job: vzdump 101 --remove 0 --notification-mode auto --mode snapshot --storage storage_backup --compress zstd --notes-template '{{guestname}}' --node pve
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2024-10-29 16:18:59
INFO: status = running
INFO: VM Name: W2022S
INFO: include disk 'virtio0' 'vm:101/vm-101-disk-2.qcow2' 132G
INFO: include disk 'virtio1' 'vm:101/vm-101-disk-3.qcow2' 256G
INFO: include disk 'efidisk0' 'vm:101/vm-101-disk-0.qcow2' 528K
INFO: include disk 'tpmstate0' 'vm:101/vm-101-disk-1.raw' 17K
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/data/backup/dump/vzdump-qemu-101-2024_10_29-16_18_59.vma.zst'
INFO: attaching TPM drive to QEMU for backup
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '0a6e9ae3-3c37-48dc-aff9-6c7bdb34b81c'
INFO: resuming VM again
INFO: 0% (659.8 MiB of 388.0 GiB) in 3s, read: 219.9 MiB/s, write: 206.0 MiB/s
INFO: 1% (3.9 GiB of 388.0 GiB) in 32s, read: 116.7 MiB/s, write: 107.1 MiB/s
INFO: 2% (7.8 GiB of 388.0 GiB) in 1m 7s, read: 113.4 MiB/s, write: 108.9 MiB/s
INFO: 3% (11.7 GiB of 388.0 GiB) in 1m 41s, read: 116.2 MiB/s, write: 114.3 MiB/s
INFO: 4% (15.6 GiB of 388.0 GiB) in 2m 29s, read: 82.9 MiB/s, write: 82.5 MiB/s
INFO: 5% (19.5 GiB of 388.0 GiB) in 3m 10s, read: 97.3 MiB/s, write: 97.0 MiB/s
INFO: 6% (23.4 GiB of 388.0 GiB) in 3m 50s, read: 100.2 MiB/s, write: 100.1 MiB/s
INFO: 7% (27.2 GiB of 388.0 GiB) in 4m 29s, read: 101.5 MiB/s, write: 101.5 MiB/s
INFO: 8% (31.1 GiB of 388.0 GiB) in 5m 8s, read: 101.4 MiB/s, write: 101.4 MiB/s
INFO: 9% (35.0 GiB of 388.0 GiB) in 5m 44s, read: 110.1 MiB/s, write: 110.1 MiB/s
INFO: 10% (38.9 GiB of 388.0 GiB) in 6m 24s, read: 99.2 MiB/s, write: 99.2 MiB/s
INFO: 11% (42.7 GiB of 388.0 GiB) in 7m 4s, read: 99.5 MiB/s, write: 99.5 MiB/s
INFO: 12% (46.6 GiB of 388.0 GiB) in 7m 46s, read: 93.4 MiB/s, write: 93.4 MiB/s
INFO: 13% (50.5 GiB of 388.0 GiB) in 8m 27s, read: 97.6 MiB/s, write: 97.6 MiB/s
INFO: 14% (54.3 GiB of 388.0 GiB) in 9m 8s, read: 96.4 MiB/s, write: 96.4 MiB/s
INFO: 15% (58.3 GiB of 388.0 GiB) in 9m 49s, read: 97.7 MiB/s, write: 97.7 MiB/s
INFO: 16% (62.1 GiB of 388.0 GiB) in 10m 30s, read: 96.2 MiB/s, write: 96.2 MiB/s
INFO: 17% (66.0 GiB of 388.0 GiB) in 11m 11s, read: 97.0 MiB/s, write: 97.0 MiB/s
INFO: 18% (69.9 GiB of 388.0 GiB) in 11m 52s, read: 97.7 MiB/s, write: 97.7 MiB/s
INFO: 19% (73.8 GiB of 388.0 GiB) in 12m 34s, read: 93.9 MiB/s, write: 93.9 MiB/s
INFO: 20% (77.6 GiB of 388.0 GiB) in 13m 3s, read: 136.7 MiB/s, write: 136.7 MiB/s
INFO: 21% (81.5 GiB of 388.0 GiB) in 13m 43s, read: 99.3 MiB/s, write: 99.3 MiB/s
INFO: 22% (85.4 GiB of 388.0 GiB) in 14m 22s, read: 101.6 MiB/s, write: 101.6 MiB/s
INFO: 23% (89.3 GiB of 388.0 GiB) in 15m 2s, read: 101.6 MiB/s, write: 101.6 MiB/s
INFO: 24% (93.1 GiB of 388.0 GiB) in 15m 40s, read: 102.3 MiB/s, write: 102.3 MiB/s
INFO: 25% (97.0 GiB of 388.0 GiB) in 16m 23s, read: 93.0 MiB/s, write: 93.0 MiB/s
INFO: 26% (100.9 GiB of 388.0 GiB) in 17m 7s, read: 90.4 MiB/s, write: 90.4 MiB/s
INFO: 27% (104.8 GiB of 388.0 GiB) in 17m 50s, read: 91.9 MiB/s, write: 91.9 MiB/s
INFO: 28% (108.7 GiB of 388.0 GiB) in 18m 32s, read: 94.3 MiB/s, write: 94.3 MiB/s
INFO: 29% (116.0 GiB of 388.0 GiB) in 19m 15s, read: 175.0 MiB/s, write: 92.3 MiB/s
INFO: 33% (129.8 GiB of 388.0 GiB) in 19m 18s, read: 4.6 GiB/s, write: 0 B/s
INFO: 36% (143.5 GiB of 388.0 GiB) in 19m 21s, read: 4.6 GiB/s, write: 0 B/s
INFO: 46% (179.5 GiB of 388.0 GiB) in 19m 24s, read: 12.0 GiB/s, write: 0 B/s
INFO: 49% (193.8 GiB of 388.0 GiB) in 19m 27s, read: 4.8 GiB/s, write: 0 B/s
INFO: 52% (203.2 GiB of 388.0 GiB) in 19m 30s, read: 3.1 GiB/s, write: 0 B/s
INFO: 60% (233.5 GiB of 388.0 GiB) in 19m 33s, read: 10.1 GiB/s, write: 21.3 KiB/s
INFO: 64% (250.5 GiB of 388.0 GiB) in 19m 36s, read: 5.7 GiB/s, write: 0 B/s
INFO: 66% (256.3 GiB of 388.0 GiB) in 19m 39s, read: 1.9 GiB/s, write: 53.7 MiB/s
INFO: 67% (260.0 GiB of 388.0 GiB) in 20m 18s, read: 98.3 MiB/s, write: 97.4 MiB/s
INFO: 68% (263.9 GiB of 388.0 GiB) in 20m 56s, read: 103.7 MiB/s, write: 102.4 MiB/s
INFO: 69% (267.8 GiB of 388.0 GiB) in 21m 24s, read: 144.5 MiB/s, write: 143.2 MiB/s
INFO: 69% (269.7 GiB of 388.0 GiB) in 21m 31s, read: 281.1 MiB/s, write: 280.1 MiB/s
ERROR: job failed with err -61 - No data available
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 101 failed - job failed with err -61 - No data available
INFO: Failed at 2024-10-29 16:40:33
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!