PBS crash/freeze during backups to an S3 backed datastore

Maiko

Member
Feb 13, 2021
5
0
21
31
Hello,

I'm currently trying the new S3 feature from PBS 4.
I have three cluster getting backed up by the same PBS instance, into 2 different DS, one local, one S3 for a new cluster.

PBS freeze during backup, the Web UI become unresponsive, the backup job fail, i need to restart proxmox-backup-proxy to fix it.

The instance is running in a Proxmox VE VM:
Code:
root@proxmox-2:~# cat /etc/pve/qemu-server/1002.conf
agent: 1,freeze-fs-on-backup=0,fstrim_cloned_disks=1
balloon: 4096
boot: order=scsi0
cores: 4
cpu: host,flags=+aes
cpuunits: 7500
hotplug: 0
machine: q35
memory: 8192
name: pbs
net0: virtio=02:00:00:66:1c:ff,bridge=vmbr0
net1: virtio=BC:24:11:7A:35:09,bridge=pbs
numa: 0
onboot: 1
ostype: l26
scsi0: rpool:vm-1002-disk-0,discard=on,iothread=1,size=32G,ssd=1
scsi1: rpool:vm-1002-disk-1,discard=on,iothread=1,size=256G,ssd=1
scsi2: rpool:vm-1002-disk-2,backup=0,discard=on,iothread=1,size=1256G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=3fe8e153-c525-4b85-a5c6-11d0a9b117d1
sockets: 1
startup: order=3,up=5,down=120
tablet: 0
tags: perso
vcpus: 4
vmgenid: fc406f36-79a4-4cfb-8786-cf1d82167d47

I have three datastore:
DATASTORE_D S3
DATASTORE_R S3
pbs-local (local DS that I wanted to migrate to DATASTORE_R)

S3 target are OVHCloud bucket located in their WAW region.

When the backup job is running, PBS just straight up freeze (Web UI included), with nothing in the log:

Code:
root@pbs:~# date
Sat Sep  6 09:17:12 AM CEST 2025

root@pbs:~# time curl https://localhost:8007
^C

real    0m21.952s
user    0m0.022s
sys    0m0.004s

root@pbs:~# systemctl status proxmox-backup
● proxmox-backup.service - Proxmox Backup API Server
     Loaded: loaded (/usr/lib/systemd/system/proxmox-backup.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-09-05 13:15:40 CEST; 20h ago
 Invocation: c7deca2e28cf41579ff93bdb70ad9161
   Main PID: 668 (proxmox-backup-)
      Tasks: 8 (limit: 9443)
     Memory: 27.3M (peak: 435.8M, swap: 9M, swap peak: 10.4M)
        CPU: 1h 43min 6.655s
     CGroup: /system.slice/proxmox-backup.service
             └─668 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-api

Sep 05 16:34:22 pbs proxmox-backup-api[668]: authentication failure; rhost=[::ffff:xx.xx.xx.xx]:53440 user=root@pam msg=invalid tfa response
Sep 05 17:05:59 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 05 21:21:23 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 05 21:22:08 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 06 00:00:18 pbs proxmox-backup-api[668]: re-opening access-log file
Sep 06 00:00:25 pbs proxmox-backup-api[668]: re-opening auth-log file
Sep 06 00:13:41 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 06 09:11:03 pbs proxmox-backup-api[668]: pam_unix(proxmox-backup-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=::ffff:xx.xx.xx.xx  user=root
Sep 06 09:11:05 pbs proxmox-backup-api[668]: authentication failure; rhost=[::ffff:xx.xx.xx.xx]:57130 user=root@pam msg=authentication error - AUTH_ERR (7)
Sep 06 09:11:09 pbs proxmox-backup-api[668]: authentication failure; rhost=[::ffff:xx.xx.xx.xx]:57131 user=root@pam msg=invalid tfa response
root@pbs:~# systemctl status proxmox-backup-proxy
● proxmox-backup-proxy.service - Proxmox Backup API Proxy Server
     Loaded: loaded (/usr/lib/systemd/system/proxmox-backup-proxy.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-09-05 13:15:40 CEST; 20h ago
 Invocation: bc190e81b4394f71b40165a583c9334a
   Main PID: 723 (proxmox-backup-)
      Tasks: 7 (limit: 9443)
     Memory: 1G (peak: 3G, swap: 26M, swap peak: 52.4M)
        CPU: 3h 58min 7.362s
     CGroup: /system.slice/proxmox-backup-proxy.service
             └─723 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-proxy

Sep 06 04:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.014 seconds)
Sep 06 05:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 05:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.041 seconds)
Sep 06 06:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.046 seconds)
Sep 06 06:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.039 seconds)
Sep 06 07:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.127 seconds)
Sep 06 07:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 08:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.063 seconds)
Sep 06 08:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.057 seconds)
Sep 06 09:11:48 pbs proxmox-backup-proxy[723]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-05T04:13:51Z/index.json.blob"' - No such fil>
root@pbs:~# systemctl restart proxmox-backup-proxy
root@pbs:~# systemctl status proxmox-backup-proxy
● proxmox-backup-proxy.service - Proxmox Backup API Proxy Server
     Loaded: loaded (/usr/lib/systemd/system/proxmox-backup-proxy.service; enabled; preset: enabled)
     Active: active (running) since Sat 2025-09-06 09:19:32 CEST; 9s ago
 Invocation: 18c15ae0a6fd49d091562bc58880eff8
   Main PID: 20945 (proxmox-backup-)
      Tasks: 9 (limit: 9443)
     Memory: 24.4M (peak: 26.7M)
        CPU: 452ms
     CGroup: /system.slice/proxmox-backup-proxy.service
             └─20945 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-proxy

Sep 06 09:19:31 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: applied rrd journal (6596 entries in 0.016 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching shutdown signal
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching reload signal
Sep 06 09:19:32 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: rrd journal successfully committed (41 files in 0.113 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: Using datastore cache with capacity 8096 for store DATASTORE_D
root@pbs:~#
root@pbs:~# curl https://localhost:8007
curl: (60) SSL: no alternative certificate subject name matches target hostname 'localhost'

Code:
root@pbs:~# proxmox-backup-manager versions --verbose
proxmox-backup                      4.0.0         running kernel: 6.14.11-1-pve
proxmox-backup-server               4.0.14-1      running version: 4.0.14
proxmox-kernel-helper               9.0.4
pve-kernel-5.11                     7.0-10
proxmox-kernel-6.14.11-1-pve-signed 6.14.11-1
proxmox-kernel-6.14                 6.14.11-1
proxmox-kernel-6.14.8-2-pve-signed  6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed 6.8.12-13
proxmox-kernel-6.8                  6.8.12-13
pve-kernel-5.13.19-2-pve            5.13.19-4
pve-kernel-5.11.22-7-pve            5.11.22-12
pve-kernel-5.11.22-1-pve            5.11.22-2
ifupdown2                           3.3.0-1+pmx10
libjs-extjs                         7.0.0-5
proxmox-backup-docs                 4.0.14-1
proxmox-backup-client               4.0.14-1
proxmox-mail-forward                1.0.2
proxmox-mini-journalreader          1.6
proxmox-offline-mirror-helper       0.7.1
proxmox-widget-toolkit              5.0.5
pve-xtermjs                         5.5.0-2
smartmontools                       7.4-pve1
zfsutils-linux                      2.3.4-pve1

root@pbs:~# zfs list
NAME                 USED  AVAIL  REFER  MOUNTPOINT
s3-cache            10.7G   235G    96K  /mnt/datastore/s3-cache
s3-cache/D  1.50G   127G  1.50G  /mnt/datastore/s3-cache/DATASTORE_D
s3-cache/R      9.18G   119G  9.18G  /mnt/datastore/s3-cache/DATASTORE_R

root@pbs:~# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
s3-cache   254G  10.7G   243G        -         -    25%     4%  1.00x    ONLINE  -

root@pbs:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                  3.9G     0  3.9G   0% /dev
tmpfs                 795M  1.5M  793M   1% /run
/dev/mapper/pbs-root   24G  8.2G   14G  38% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                 1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
tmpfs                 3.9G     0  3.9G   0% /tmp
/dev/sdc1             1.3T  296G  961G  24% /mnt/datastore/pbs-local
tmpfs                 1.0M     0  1.0M   0% /run/credentials/getty@tty1.service
s3-cache              236G  128K  236G   1% /mnt/datastore/s3-cache
s3-cache/D    128G  1.5G  127G   2% /mnt/datastore/s3-cache/DATASTORE_D
s3-cache/R        128G  9.2G  119G   8% /mnt/datastore/s3-cache/DATASTORE_R
tmpfs                 795M  8.0K  795M   1% /run/user/0

root@pbs:~# proxmox-backup-manager s3 endpoint list --output-format json-pretty
[
  {
    "access-key": "XXXXX",
    "endpoint": "{{bucket}}.s3.{{region}}.io.cloud.ovh.net",
    "id": "OVHCloud",
    "provider-quirks": [
      "skip-if-none-match-header"
    ],
    "region": "waw"
  },
  {
    "access-key": "YYYYY",
    "endpoint": "{{bucket}}.s3.{{region}}.io.cloud.ovh.net",
    "id": "OVHCloud-D",
    "provider-quirks": [
      "skip-if-none-match-header"
    ],
    "region": "waw"
  }
]

root@pbs:~# proxmox-backup-manager datastore list --output-format json-pretty
[
  {
    "backend": "type=s3,client=OVHCloud-D,bucket=proxmoxbackupserver-d",
    "gc-schedule": "11:00",
    "name": "digityser",
    "path": "/mnt/datastore/s3-cache/DATASTORE_D"
  },
  {
    "backend": "bucket=pbs-r,client=OVHCloud,type=s3",
    "comment": "",
    "gc-schedule": "daily",
    "name": "DATASTORE_R",
    "notification-mode": "notification-system",
    "path": "/mnt/datastore/s3-cache/DATASTORE_R"
  },
  {
    "gc-schedule": "daily",
    "name": "pbs-local",
    "notification-mode": "legacy-sendmail",
    "notify": "gc=error,prune=error,sync=error,verify=error",
    "path": "/mnt/datastore/pbs-local"
  }
]

I'm kinda lost because I can't find an explanation for the issue in the logs. Any idea ?

Thanks.
Best,
Maiko
 
Code:
Sep 06 00:00:25 pbs proxmox-backup-proxy[723]: re-opening auth-log file
Sep 06 00:15:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.038 seconds)
Sep 06 00:45:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.041 seconds)
Sep 06 01:15:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.018 seconds)
Sep 06 01:45:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.040 seconds)
Sep 06 02:15:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.045 seconds)
Sep 06 02:45:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.046 seconds)
Sep 06 03:15:13 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/100/2025-09-06T01:15:05Z/client.log.blob
Sep 06 03:15:46 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' vm/201/2025-09-06T01:15:13Z/client.log.blob
Sep 06 03:15:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.081 seconds)
Sep 06 03:17:17 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/301/2025-09-06T01:15:46Z/client.log.blob
Sep 06 03:17:33 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' vm/116/2025-09-06T01:15:06Z/client.log.blob
Sep 06 03:17:47 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/999/2025-09-06T01:17:17Z/client.log.blob
Sep 06 03:17:59 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/10999/2025-09-06T01:15:05Z/client.log.blob
Sep 06 03:18:56 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/123/2025-09-06T01:17:33Z/client.log.blob
Sep 06 03:35:09 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' vm/100/2025-09-06T01:35:06Z/client.log.blob
Sep 06 03:35:59 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' ct/102/2025-09-06T01:35:00Z/client.log.blob
Sep 06 03:36:17 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' ct/153/2025-09-06T01:35:59Z/client.log.blob
Sep 06 03:36:23 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' vm/254/2025-09-06T01:36:17Z/client.log.blob
Sep 06 03:37:49 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' ct/103/2025-09-06T01:35:07Z/client.log.blob
Sep 06 03:45:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.097 seconds)
Sep 06 04:15:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.101 seconds)
Sep 06 04:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.014 seconds)
Sep 06 05:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 05:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.041 seconds)
Sep 06 06:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.046 seconds)
Sep 06 06:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.039 seconds)
Sep 06 07:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.127 seconds)
Sep 06 07:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 08:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.063 seconds)
Sep 06 08:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.057 seconds)
Sep 06 09:11:48 pbs proxmox-backup-proxy[723]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-05T04:13:51Z/index.json.blob"' - No such fil>
Sep 06 09:19:31 pbs systemd[1]: Stopping proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:19:31 pbs systemd[1]: proxmox-backup-proxy.service: Deactivated successfully.
Sep 06 09:19:31 pbs systemd[1]: Stopped proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:19:31 pbs systemd[1]: proxmox-backup-proxy.service: Consumed 3h 58min 7.380s CPU time, 3G memory peak, 52.4M memory swap peak.
Sep 06 09:19:31 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: applied rrd journal (6596 entries in 0.016 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching shutdown signal
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching reload signal
Sep 06 09:19:32 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: rrd journal successfully committed (41 files in 0.113 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: Using datastore cache with capacity 8096 for store digityser
Sep 06 09:19:52 pbs proxmox-backup-proxy[20945]: Upload backup log to datastore 'digityser', namespace 'NAMESPACE_D' vm/200/2025-09-06T07:19:32Z/client.log.blob
Sep 06 09:26:10 pbs systemd[1]: Stopping proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:26:10 pbs systemd[1]: proxmox-backup-proxy.service: Deactivated successfully.
Sep 06 09:26:10 pbs systemd[1]: Stopped proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:26:10 pbs systemd[1]: proxmox-backup-proxy.service: Consumed 1.493s CPU time, 78.7M memory peak.
Sep 06 09:26:10 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: catching shutdown signal
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: catching reload signal
Sep 06 09:26:10 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: applied rrd journal (166 entries in 0.018 seconds)
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.030 seconds)
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: Using datastore cache with capacity 8096 for store DATASTORE_D
Sep 06 09:26:11 pbs proxmox-backup-proxy[21046]: Using datastore cache with capacity 7604 for store DATASTORE_R
Sep 06 09:26:34 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/302/2025-09-06T07:12:28Z/index.json.blob"' - No such f>
Sep 06 09:26:34 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/101/2025-09-06T07:12:28Z/index.json.blob"' - No such f>
Sep 06 09:26:35 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-06T07:19:53Z/index.json.blob"' - No such f>
Sep 06 09:26:35 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-05T04:13:51Z/index.json.blob"' - No such f>
Sep 06 09:26:35 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/ct/999/2025-09-06T07:19:33Z/index.json.blob"' - No such f>

Backup job on the VE nodes:
Code:
INFO: starting new backup job: vzdump 101 501 999 200 301 302 303 --mailto hello@maiko-bossuyt.eu --all 0 --zstd 0 --compress zstd --mailnotification always --storage pbs --quiet 1 --node proxmox-3 --mode snapshot --fleecing '1,storage=ceph'
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2025-09-06 07:12:28
INFO: status = running
INFO: VM Name: plesk
INFO: include disk 'scsi0' 'ceph:vm-101-disk-1' 600G
INFO: include disk 'efidisk0' 'omv_backup_digilan:101/vm-101-disk-0.raw' 128K
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/101/2025-09-06T07:12:28Z'
/dev/rbd1
INFO: drive-scsi0: attaching fleecing image ceph:vm-101-fleece-0 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 101 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
Removing image: 1% complete...
...
Removing image: 100% complete...done.
ERROR: Backup of VM 101 failed - VM 101 qmp command 'backup' failed - got timeout
INFO: Failed at 2025-09-06 07:19:32
INFO: Starting Backup of VM 200 (qemu)
INFO: Backup started at 2025-09-06 07:19:32
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: base-debian11
INFO: include disk 'scsi0' 'ceph:base-200-disk-0' 32G
/dev/rbd1
INFO: creating Proxmox Backup Server archive 'vm/200/2025-09-06T07:19:32Z'
INFO: starting kvm to execute backup task
/dev/rbd8
generating cloud-init ISO
INFO: started backup task '565f2a95-3b5c-4893-9746-cf161bbc24a6'
INFO:  17% (5.5 GiB of 32.0 GiB) in 3s, read: 1.8 GiB/s, write: 0 B/s
INFO:  31% (10.2 GiB of 32.0 GiB) in 6s, read: 1.6 GiB/s, write: 0 B/s
INFO:  55% (17.9 GiB of 32.0 GiB) in 10s, read: 1.9 GiB/s, write: 0 B/s
INFO:  80% (25.7 GiB of 32.0 GiB) in 13s, read: 2.6 GiB/s, write: 0 B/s
INFO:  98% (31.4 GiB of 32.0 GiB) in 16s, read: 1.9 GiB/s, write: 0 B/s
INFO: 100% (32.0 GiB of 32.0 GiB) in 17s, read: 636.0 MiB/s, write: 0 B/s
INFO: backup is sparse: 29.25 GiB (91%) total zero data
INFO: backup was done incrementally, reused 32.00 GiB (100%)
INFO: transferred 32.00 GiB in 17 seconds (1.9 GiB/s)
INFO: stopping kvm after backup task
INFO: Finished Backup of VM 200 (00:00:20)
INFO: Backup finished at 2025-09-06 07:19:52
INFO: Starting Backup of VM 303 (qemu)
INFO: Backup started at 2025-09-06 07:19:53
INFO: status = running
INFO: VM Name: c3.cloud
INFO: include disk 'scsi0' 'ceph:vm-303-disk-1' 32G
INFO: include disk 'efidisk0' 'ceph:vm-303-disk-0' 528K
INFO: include disk 'tpmstate0' 'ceph:vm-303-disk-2' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/303/2025-09-06T07:19:53Z'
INFO: attaching TPM drive to QEMU for backup
/dev/rbd1
INFO: drive-scsi0: attaching fleecing image ceph:vm-303-fleece-0 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '99170614-5fbd-4a64-9706-e785ebc03f32'
INFO: resuming VM again
INFO: efidisk0: dirty-bitmap status: OK (drive clean)
INFO: scsi0: dirty-bitmap status: OK (296.0 MiB of 32.0 GiB dirty)
INFO: tpmstate0-backup: dirty-bitmap status: created new
INFO: using fast incremental mode (dirty-bitmap), 300.0 MiB dirty of 32.0 GiB total
INFO: 100% (300.0 MiB of 300.0 MiB) in 3s, read: 100.0 MiB/s, write: 94.7 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job

INFO: Starting Backup of VM 302 (qemu)
INFO: Backup started at 2025-09-06 09:12:28
INFO: status = running
INFO: VM Name: c2.cloud
INFO: include disk 'scsi0' 'ceph:vm-302-disk-1' 32G
INFO: include disk 'efidisk0' 'ceph:vm-302-disk-0' 528K
INFO: include disk 'tpmstate0' 'ceph:vm-302-disk-2' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/302/2025-09-06T07:12:28Z'
INFO: attaching TPM drive to QEMU for backup
trying to acquire cfs lock 'storage-ceph' ...
/dev/rbd5
INFO: drive-scsi0: attaching fleecing image ceph:vm-302-fleece-0 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '12324d0c-f399-4c45-81ba-4acfa84227ce'
INFO: resuming VM again
INFO: efidisk0: dirty-bitmap status: OK (drive clean)
INFO: scsi0: dirty-bitmap status: OK (312.0 MiB of 32.0 GiB dirty)
INFO: tpmstate0-backup: dirty-bitmap status: created new
INFO: using fast incremental mode (dirty-bitmap), 316.0 MiB dirty of 32.0 GiB total
INFO: 100% (316.0 MiB of 316.0 MiB) in 3s, read: 105.3 MiB/s, write: 102.7 MiB/s
ERROR: backup close image failed: command error: stream closed because of a broken pipe
INFO: aborting backup job
INFO: resuming VM again
trying to acquire cfs lock 'storage-ceph' ...
Removing image: 1% complete...
...
Removing image: 100% complete...done.
ERROR: Backup of VM 302 failed - backup close image failed: command error: stream closed because of a broken pipe
INFO: Failed at 2025-09-06 09:19:33



root@proxmox-3:~# pveversion --verbose
proxmox-ve: 8.3.0 (running kernel: 6.8.12-1-pve)
pve-manager: 8.3.2 (running version: 8.3.2/3e76eec21c4a14a7)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-15
proxmox-kernel-6.8: 6.8.12-5
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
pve-kernel-5.4: 6.4-12
pve-kernel-5.15.158-2-pve: 5.15.158-2
pve-kernel-5.4.162-1-pve: 5.4.162-2
ceph: 18.2.4-pve3
ceph-fuse: 18.2.4-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.3
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-2
pve-ha-manager: 4.0.6
pve-i18n: 3.3.2
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
pve-zsync: 2.3.1
qemu-server: 8.3.3
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
 
I have upgraded PVE on that cluster and I can't reproduce the issue anymore.

was:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-1-pve),
proxmox-backup-client: 3.3.2-1

now: pve-manager: 8.4.12 (running version: 8.4.12/c2ea8261d32a5020),
proxmox-backup-client: 3.4.6-1

I'll keep the post open waiting for all my scheduled job to run tonight (concurrency seems to be key to trigger this behaviour) and update the post tomorrow.
 
I can confirm the freezing of the pbs.
I can backup succesfully vms with disksize of 32GB or smaller to the S3 datastore.
When I start a vm (actually a Windows 11 with 64GB disk size) it goes through all the way to 100% (shown in the taskviewer on the proxmox node), and then it freezes. When at the same time on the pbs I have the taskviewer open I see the chuncks passing by, but the moment 100% backup is reached and the node backup freezes, the chuncks list freezes as well. And also the dashboard freezes and times out. Only a reboot will return the dashboard. In pbs the backup shows up as size 0.

When I close the taskviewer on the pbs and re-open, it now shows:


Screenshot 2025-09-06 162335.jpg
 
Ok, addition. I set the option Skip If-None-Match header for the S3 I am using. And now I can backup the 64GB vm.
No more freeze. I will investigate further.