Hi,
Don't see any followup to this - but we're having precisely the same problem.
This is a brand new PBS installation - all updates are done - and the zpool layout is as follows:
Code:
root@pbs-01:~# zpool status
pool: PBS01-Datastore
state: ONLINE
scan: scrub repaired 0B in 00:17:24 with 0 errors on Tue Feb 22 19:40:51 2022
config:
NAME STATE READ WRITE CKSUM
PBS01-Datastore ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdu ONLINE 0 0 0
sdv ONLINE 0 0 0
sdw ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdaa ONLINE 0 0 0
sdab ONLINE 0 0 0
sdac ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdad ONLINE 0 0 0
sdae ONLINE 0 0 0
sdaf ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdag ONLINE 0 0 0
sdah ONLINE 0 0 0
sdai ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
sdaj ONLINE 0 0 0
sdak ONLINE 0 0 0
sdal ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
sdi ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
sdn ONLINE 0 0 0
mirror-7 ONLINE 0 0 0
sdo ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
mirror-8 ONLINE 0 0 0
sdr ONLINE 0 0 0
sds ONLINE 0 0 0
sdt ONLINE 0 0 0
special
mirror-9 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
logs
mirror-10 ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:00:13 with 0 errors on Tue Feb 22 19:23:31 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-SAMSUNG_MZ7KM960HAHP-00005_S2HTNX0J400741-part3 ONLINE 0 0 0
ata-SAMSUNG_MZ7KM960HAHP-00005_S2HTNX0J400728-part3 ONLINE 0 0 0
ata-SAMSUNG_MZ7KM960HAHP-00005_S2HTNX0H413627-part3 ONLINE 0 0 0
errors: No known data errors
The 'rpool' contains the OS (3-way SSD RAIDZ Mirror), and the PBS01-Datastore contains the backup datastore - constructed of 3-way HDD RAIDZ mirrors and SSD-based mirrored SPECIAL device and a mirrored SLOG/ZIL device (and in a 44 unit Supermicro JBOD - HGST12TB drives).
The PBS 'compute' unit is a 2x Intel Gold 6150 Supermicro Ultra with 384GB RAM.
So on paper, we should have no performance issues - and these errors we are getting are purely during test, not production usage.
We are seeing this (as an example) in our logs and we see failing backups:
Code:
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: starting new backup on datastore 'PBS01-Datastore': "vm/1869/2022-02-23T15:11:38Z"
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: GET /previous: 400 Bad Request: no valid previous backup
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: created new fixed index 1 ("vm/1869/2022-02-23T15:11:38Z/drive-scsi0.img.fidx")
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: created new fixed index 2 ("vm/1869/2022-02-23T15:11:38Z/drive-scsi1.img.fidx")
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: add blob "/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z/qemu-server.conf.blob" (358 bytes, comp: 358)
Feb 23 17:11:41 pbs-01 proxmox-backup-proxy[5448]: add blob "/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z/fw.conf.blob" (138 bytes, comp: 138)
Feb 23 17:14:11 pbs-01 systemd[1]: session-1.scope: Succeeded.
Feb 23 17:14:11 pbs-01 systemd[1]: session-3.scope: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopping User Manager for UID 0...
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Main User Target.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Basic System.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Paths.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Sockets.
Feb 23 17:14:21 pbs-01 systemd[6105]: Stopped target Timers.
Feb 23 17:14:21 pbs-01 systemd[6105]: dirmngr.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG network certificate management daemon.
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent-browser.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent-extra.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent-ssh.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Feb 23 17:14:21 pbs-01 systemd[6105]: gpg-agent.socket: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Closed GnuPG cryptographic agent and passphrase cache.
Feb 23 17:14:21 pbs-01 systemd[6105]: Removed slice User Application Slice.
Feb 23 17:14:21 pbs-01 systemd[6105]: Reached target Shutdown.
Feb 23 17:14:21 pbs-01 systemd[6105]: systemd-exit.service: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[6105]: Finished Exit the Session.
Feb 23 17:14:21 pbs-01 systemd[6105]: Reached target Exit the Session.
Feb 23 17:14:21 pbs-01 systemd[1]: user@0.service: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopped User Manager for UID 0.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopping User Runtime Directory /run/user/0...
Feb 23 17:14:21 pbs-01 systemd[1]: run-user-0.mount: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: user-runtime-dir@0.service: Succeeded.
Feb 23 17:14:21 pbs-01 systemd[1]: Stopped User Runtime Directory /run/user/0.
Feb 23 17:14:21 pbs-01 systemd[1]: Removed slice User Slice of UID 0.
Feb 23 17:14:21 pbs-01 systemd[1]: user-0.slice: Consumed 1min 41.746s CPU time.
Feb 23 17:14:56 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z /index.json.blob"' - No such file or directory (os error 2)'
Feb 23 17:14:56 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1403/2022-02-23T14:17:09Z /index.json.blob"' - No such file or directory (os error 2)'
Feb 23 17:15:01 pbs-01 CRON[1363317]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Feb 23 17:16:00 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1869/2022-02-23T15:11:38Z /index.json.blob"' - No such file or directory (os error 2)'
Feb 23 17:16:00 pbs-01 proxmox-backup-proxy[5448]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/PBS01-Datastore/vm/1403/2022-02-23T14:17:09Z /index.json.blob"' - No such file or directory (os error 2)'
So one of the backups started fine - large 6TB VM - and progressed fine until about 01:00 this morning after which things 'broke' - no further syslog or kern.log entries and all the while both the PVE host and the PBS server indicating that the backup was still ongoing - but it simply did not progress and hung up.
We're looking at whether this could be linked to the SPECIAL and/or SLOG/ZIL device/s (with the SLOG/ZIL device looking like it actually is not utilised/serves no prupose in the PBS setup).
Anything else we should be looking at/any ideas?
Kind regards,
Angelo.