PBS crash/freeze during backups to an S3 backed datastore

Maiko

Member
Feb 13, 2021
14
0
21
31
Hello,

I'm currently trying the new S3 feature from PBS 4.
I have three cluster getting backed up by the same PBS instance, into 2 different DS, one local, one S3 for a new cluster.

PBS freeze during backup, the Web UI become unresponsive, the backup job fail, i need to restart proxmox-backup-proxy to fix it.

The instance is running in a Proxmox VE VM:
Code:
root@proxmox-2:~# cat /etc/pve/qemu-server/1002.conf
agent: 1,freeze-fs-on-backup=0,fstrim_cloned_disks=1
balloon: 4096
boot: order=scsi0
cores: 4
cpu: host,flags=+aes
cpuunits: 7500
hotplug: 0
machine: q35
memory: 8192
name: pbs
net0: virtio=02:00:00:66:1c:ff,bridge=vmbr0
net1: virtio=BC:24:11:7A:35:09,bridge=pbs
numa: 0
onboot: 1
ostype: l26
scsi0: rpool:vm-1002-disk-0,discard=on,iothread=1,size=32G,ssd=1
scsi1: rpool:vm-1002-disk-1,discard=on,iothread=1,size=256G,ssd=1
scsi2: rpool:vm-1002-disk-2,backup=0,discard=on,iothread=1,size=1256G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=3fe8e153-c525-4b85-a5c6-11d0a9b117d1
sockets: 1
startup: order=3,up=5,down=120
tablet: 0
tags: perso
vcpus: 4
vmgenid: fc406f36-79a4-4cfb-8786-cf1d82167d47

I have three datastore:
DATASTORE_D S3
DATASTORE_R S3
pbs-local (local DS that I wanted to migrate to DATASTORE_R)

S3 target are OVHCloud bucket located in their WAW region.

When the backup job is running, PBS just straight up freeze (Web UI included), with nothing in the log:

Code:
root@pbs:~# date
Sat Sep  6 09:17:12 AM CEST 2025

root@pbs:~# time curl https://localhost:8007
^C

real    0m21.952s
user    0m0.022s
sys    0m0.004s

root@pbs:~# systemctl status proxmox-backup
● proxmox-backup.service - Proxmox Backup API Server
     Loaded: loaded (/usr/lib/systemd/system/proxmox-backup.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-09-05 13:15:40 CEST; 20h ago
 Invocation: c7deca2e28cf41579ff93bdb70ad9161
   Main PID: 668 (proxmox-backup-)
      Tasks: 8 (limit: 9443)
     Memory: 27.3M (peak: 435.8M, swap: 9M, swap peak: 10.4M)
        CPU: 1h 43min 6.655s
     CGroup: /system.slice/proxmox-backup.service
             └─668 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-api

Sep 05 16:34:22 pbs proxmox-backup-api[668]: authentication failure; rhost=[::ffff:xx.xx.xx.xx]:53440 user=root@pam msg=invalid tfa response
Sep 05 17:05:59 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 05 21:21:23 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 05 21:22:08 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 06 00:00:18 pbs proxmox-backup-api[668]: re-opening access-log file
Sep 06 00:00:25 pbs proxmox-backup-api[668]: re-opening auth-log file
Sep 06 00:13:41 pbs proxmox-backup-api[668]: notified via target `mail-to-root`
Sep 06 09:11:03 pbs proxmox-backup-api[668]: pam_unix(proxmox-backup-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=::ffff:xx.xx.xx.xx  user=root
Sep 06 09:11:05 pbs proxmox-backup-api[668]: authentication failure; rhost=[::ffff:xx.xx.xx.xx]:57130 user=root@pam msg=authentication error - AUTH_ERR (7)
Sep 06 09:11:09 pbs proxmox-backup-api[668]: authentication failure; rhost=[::ffff:xx.xx.xx.xx]:57131 user=root@pam msg=invalid tfa response
root@pbs:~# systemctl status proxmox-backup-proxy
● proxmox-backup-proxy.service - Proxmox Backup API Proxy Server
     Loaded: loaded (/usr/lib/systemd/system/proxmox-backup-proxy.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-09-05 13:15:40 CEST; 20h ago
 Invocation: bc190e81b4394f71b40165a583c9334a
   Main PID: 723 (proxmox-backup-)
      Tasks: 7 (limit: 9443)
     Memory: 1G (peak: 3G, swap: 26M, swap peak: 52.4M)
        CPU: 3h 58min 7.362s
     CGroup: /system.slice/proxmox-backup-proxy.service
             └─723 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-proxy

Sep 06 04:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.014 seconds)
Sep 06 05:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 05:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.041 seconds)
Sep 06 06:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.046 seconds)
Sep 06 06:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.039 seconds)
Sep 06 07:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.127 seconds)
Sep 06 07:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 08:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.063 seconds)
Sep 06 08:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.057 seconds)
Sep 06 09:11:48 pbs proxmox-backup-proxy[723]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-05T04:13:51Z/index.json.blob"' - No such fil>
root@pbs:~# systemctl restart proxmox-backup-proxy
root@pbs:~# systemctl status proxmox-backup-proxy
● proxmox-backup-proxy.service - Proxmox Backup API Proxy Server
     Loaded: loaded (/usr/lib/systemd/system/proxmox-backup-proxy.service; enabled; preset: enabled)
     Active: active (running) since Sat 2025-09-06 09:19:32 CEST; 9s ago
 Invocation: 18c15ae0a6fd49d091562bc58880eff8
   Main PID: 20945 (proxmox-backup-)
      Tasks: 9 (limit: 9443)
     Memory: 24.4M (peak: 26.7M)
        CPU: 452ms
     CGroup: /system.slice/proxmox-backup-proxy.service
             └─20945 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-proxy

Sep 06 09:19:31 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: applied rrd journal (6596 entries in 0.016 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching shutdown signal
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching reload signal
Sep 06 09:19:32 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: rrd journal successfully committed (41 files in 0.113 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: Using datastore cache with capacity 8096 for store DATASTORE_D
root@pbs:~#
root@pbs:~# curl https://localhost:8007
curl: (60) SSL: no alternative certificate subject name matches target hostname 'localhost'

Code:
root@pbs:~# proxmox-backup-manager versions --verbose
proxmox-backup                      4.0.0         running kernel: 6.14.11-1-pve
proxmox-backup-server               4.0.14-1      running version: 4.0.14
proxmox-kernel-helper               9.0.4
pve-kernel-5.11                     7.0-10
proxmox-kernel-6.14.11-1-pve-signed 6.14.11-1
proxmox-kernel-6.14                 6.14.11-1
proxmox-kernel-6.14.8-2-pve-signed  6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed 6.8.12-13
proxmox-kernel-6.8                  6.8.12-13
pve-kernel-5.13.19-2-pve            5.13.19-4
pve-kernel-5.11.22-7-pve            5.11.22-12
pve-kernel-5.11.22-1-pve            5.11.22-2
ifupdown2                           3.3.0-1+pmx10
libjs-extjs                         7.0.0-5
proxmox-backup-docs                 4.0.14-1
proxmox-backup-client               4.0.14-1
proxmox-mail-forward                1.0.2
proxmox-mini-journalreader          1.6
proxmox-offline-mirror-helper       0.7.1
proxmox-widget-toolkit              5.0.5
pve-xtermjs                         5.5.0-2
smartmontools                       7.4-pve1
zfsutils-linux                      2.3.4-pve1

root@pbs:~# zfs list
NAME                 USED  AVAIL  REFER  MOUNTPOINT
s3-cache            10.7G   235G    96K  /mnt/datastore/s3-cache
s3-cache/D  1.50G   127G  1.50G  /mnt/datastore/s3-cache/DATASTORE_D
s3-cache/R      9.18G   119G  9.18G  /mnt/datastore/s3-cache/DATASTORE_R

root@pbs:~# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
s3-cache   254G  10.7G   243G        -         -    25%     4%  1.00x    ONLINE  -

root@pbs:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                  3.9G     0  3.9G   0% /dev
tmpfs                 795M  1.5M  793M   1% /run
/dev/mapper/pbs-root   24G  8.2G   14G  38% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                 1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
tmpfs                 3.9G     0  3.9G   0% /tmp
/dev/sdc1             1.3T  296G  961G  24% /mnt/datastore/pbs-local
tmpfs                 1.0M     0  1.0M   0% /run/credentials/getty@tty1.service
s3-cache              236G  128K  236G   1% /mnt/datastore/s3-cache
s3-cache/D    128G  1.5G  127G   2% /mnt/datastore/s3-cache/DATASTORE_D
s3-cache/R        128G  9.2G  119G   8% /mnt/datastore/s3-cache/DATASTORE_R
tmpfs                 795M  8.0K  795M   1% /run/user/0

root@pbs:~# proxmox-backup-manager s3 endpoint list --output-format json-pretty
[
  {
    "access-key": "XXXXX",
    "endpoint": "{{bucket}}.s3.{{region}}.io.cloud.ovh.net",
    "id": "OVHCloud",
    "provider-quirks": [
      "skip-if-none-match-header"
    ],
    "region": "waw"
  },
  {
    "access-key": "YYYYY",
    "endpoint": "{{bucket}}.s3.{{region}}.io.cloud.ovh.net",
    "id": "OVHCloud-D",
    "provider-quirks": [
      "skip-if-none-match-header"
    ],
    "region": "waw"
  }
]

root@pbs:~# proxmox-backup-manager datastore list --output-format json-pretty
[
  {
    "backend": "type=s3,client=OVHCloud-D,bucket=proxmoxbackupserver-d",
    "gc-schedule": "11:00",
    "name": "digityser",
    "path": "/mnt/datastore/s3-cache/DATASTORE_D"
  },
  {
    "backend": "bucket=pbs-r,client=OVHCloud,type=s3",
    "comment": "",
    "gc-schedule": "daily",
    "name": "DATASTORE_R",
    "notification-mode": "notification-system",
    "path": "/mnt/datastore/s3-cache/DATASTORE_R"
  },
  {
    "gc-schedule": "daily",
    "name": "pbs-local",
    "notification-mode": "legacy-sendmail",
    "notify": "gc=error,prune=error,sync=error,verify=error",
    "path": "/mnt/datastore/pbs-local"
  }
]

I'm kinda lost because I can't find an explanation for the issue in the logs. Any idea ?

Thanks.
Best,
Maiko
 
Code:
Sep 06 00:00:25 pbs proxmox-backup-proxy[723]: re-opening auth-log file
Sep 06 00:15:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.038 seconds)
Sep 06 00:45:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.041 seconds)
Sep 06 01:15:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.018 seconds)
Sep 06 01:45:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.040 seconds)
Sep 06 02:15:48 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.045 seconds)
Sep 06 02:45:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.046 seconds)
Sep 06 03:15:13 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/100/2025-09-06T01:15:05Z/client.log.blob
Sep 06 03:15:46 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' vm/201/2025-09-06T01:15:13Z/client.log.blob
Sep 06 03:15:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.081 seconds)
Sep 06 03:17:17 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/301/2025-09-06T01:15:46Z/client.log.blob
Sep 06 03:17:33 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' vm/116/2025-09-06T01:15:06Z/client.log.blob
Sep 06 03:17:47 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/999/2025-09-06T01:17:17Z/client.log.blob
Sep 06 03:17:59 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/10999/2025-09-06T01:15:05Z/client.log.blob
Sep 06 03:18:56 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/dNet' ct/123/2025-09-06T01:17:33Z/client.log.blob
Sep 06 03:35:09 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' vm/100/2025-09-06T01:35:06Z/client.log.blob
Sep 06 03:35:59 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' ct/102/2025-09-06T01:35:00Z/client.log.blob
Sep 06 03:36:17 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' ct/153/2025-09-06T01:35:59Z/client.log.blob
Sep 06 03:36:23 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' vm/254/2025-09-06T01:36:17Z/client.log.blob
Sep 06 03:37:49 pbs proxmox-backup-proxy[723]: Upload backup log to datastore 'pbs-local', namespace 'clusters/Lyoko' ct/103/2025-09-06T01:35:07Z/client.log.blob
Sep 06 03:45:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.097 seconds)
Sep 06 04:15:49 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.101 seconds)
Sep 06 04:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.014 seconds)
Sep 06 05:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 05:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.041 seconds)
Sep 06 06:15:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.046 seconds)
Sep 06 06:45:50 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.039 seconds)
Sep 06 07:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.127 seconds)
Sep 06 07:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 06 08:15:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.063 seconds)
Sep 06 08:45:51 pbs proxmox-backup-proxy[723]: rrd journal successfully committed (41 files in 0.057 seconds)
Sep 06 09:11:48 pbs proxmox-backup-proxy[723]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-05T04:13:51Z/index.json.blob"' - No such fil>
Sep 06 09:19:31 pbs systemd[1]: Stopping proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:19:31 pbs systemd[1]: proxmox-backup-proxy.service: Deactivated successfully.
Sep 06 09:19:31 pbs systemd[1]: Stopped proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:19:31 pbs systemd[1]: proxmox-backup-proxy.service: Consumed 3h 58min 7.380s CPU time, 3G memory peak, 52.4M memory swap peak.
Sep 06 09:19:31 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: applied rrd journal (6596 entries in 0.016 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching shutdown signal
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: catching reload signal
Sep 06 09:19:32 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: rrd journal successfully committed (41 files in 0.113 seconds)
Sep 06 09:19:32 pbs proxmox-backup-proxy[20945]: Using datastore cache with capacity 8096 for store digityser
Sep 06 09:19:52 pbs proxmox-backup-proxy[20945]: Upload backup log to datastore 'digityser', namespace 'NAMESPACE_D' vm/200/2025-09-06T07:19:32Z/client.log.blob
Sep 06 09:26:10 pbs systemd[1]: Stopping proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:26:10 pbs systemd[1]: proxmox-backup-proxy.service: Deactivated successfully.
Sep 06 09:26:10 pbs systemd[1]: Stopped proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:26:10 pbs systemd[1]: proxmox-backup-proxy.service: Consumed 1.493s CPU time, 78.7M memory peak.
Sep 06 09:26:10 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: catching shutdown signal
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: catching reload signal
Sep 06 09:26:10 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: applied rrd journal (166 entries in 0.018 seconds)
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.030 seconds)
Sep 06 09:26:10 pbs proxmox-backup-proxy[21046]: Using datastore cache with capacity 8096 for store DATASTORE_D
Sep 06 09:26:11 pbs proxmox-backup-proxy[21046]: Using datastore cache with capacity 7604 for store DATASTORE_R
Sep 06 09:26:34 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/302/2025-09-06T07:12:28Z/index.json.blob"' - No such f>
Sep 06 09:26:34 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/101/2025-09-06T07:12:28Z/index.json.blob"' - No such f>
Sep 06 09:26:35 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-06T07:19:53Z/index.json.blob"' - No such f>
Sep 06 09:26:35 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/vm/303/2025-09-05T04:13:51Z/index.json.blob"' - No such f>
Sep 06 09:26:35 pbs proxmox-backup-proxy[21046]: error during snapshot file listing: 'unable to load blob '"/mnt/datastore/s3-cache/DATASTORE_D/ns/NAMESPACE_D/ct/999/2025-09-06T07:19:33Z/index.json.blob"' - No such f>

Backup job on the VE nodes:
Code:
INFO: starting new backup job: vzdump 101 501 999 200 301 302 303 --mailto hello@maiko-bossuyt.eu --all 0 --zstd 0 --compress zstd --mailnotification always --storage pbs --quiet 1 --node proxmox-3 --mode snapshot --fleecing '1,storage=ceph'
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2025-09-06 07:12:28
INFO: status = running
INFO: VM Name: plesk
INFO: include disk 'scsi0' 'ceph:vm-101-disk-1' 600G
INFO: include disk 'efidisk0' 'omv_backup_digilan:101/vm-101-disk-0.raw' 128K
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/101/2025-09-06T07:12:28Z'
/dev/rbd1
INFO: drive-scsi0: attaching fleecing image ceph:vm-101-fleece-0 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 101 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
Removing image: 1% complete...
...
Removing image: 100% complete...done.
ERROR: Backup of VM 101 failed - VM 101 qmp command 'backup' failed - got timeout
INFO: Failed at 2025-09-06 07:19:32
INFO: Starting Backup of VM 200 (qemu)
INFO: Backup started at 2025-09-06 07:19:32
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: base-debian11
INFO: include disk 'scsi0' 'ceph:base-200-disk-0' 32G
/dev/rbd1
INFO: creating Proxmox Backup Server archive 'vm/200/2025-09-06T07:19:32Z'
INFO: starting kvm to execute backup task
/dev/rbd8
generating cloud-init ISO
INFO: started backup task '565f2a95-3b5c-4893-9746-cf161bbc24a6'
INFO:  17% (5.5 GiB of 32.0 GiB) in 3s, read: 1.8 GiB/s, write: 0 B/s
INFO:  31% (10.2 GiB of 32.0 GiB) in 6s, read: 1.6 GiB/s, write: 0 B/s
INFO:  55% (17.9 GiB of 32.0 GiB) in 10s, read: 1.9 GiB/s, write: 0 B/s
INFO:  80% (25.7 GiB of 32.0 GiB) in 13s, read: 2.6 GiB/s, write: 0 B/s
INFO:  98% (31.4 GiB of 32.0 GiB) in 16s, read: 1.9 GiB/s, write: 0 B/s
INFO: 100% (32.0 GiB of 32.0 GiB) in 17s, read: 636.0 MiB/s, write: 0 B/s
INFO: backup is sparse: 29.25 GiB (91%) total zero data
INFO: backup was done incrementally, reused 32.00 GiB (100%)
INFO: transferred 32.00 GiB in 17 seconds (1.9 GiB/s)
INFO: stopping kvm after backup task
INFO: Finished Backup of VM 200 (00:00:20)
INFO: Backup finished at 2025-09-06 07:19:52
INFO: Starting Backup of VM 303 (qemu)
INFO: Backup started at 2025-09-06 07:19:53
INFO: status = running
INFO: VM Name: c3.cloud
INFO: include disk 'scsi0' 'ceph:vm-303-disk-1' 32G
INFO: include disk 'efidisk0' 'ceph:vm-303-disk-0' 528K
INFO: include disk 'tpmstate0' 'ceph:vm-303-disk-2' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/303/2025-09-06T07:19:53Z'
INFO: attaching TPM drive to QEMU for backup
/dev/rbd1
INFO: drive-scsi0: attaching fleecing image ceph:vm-303-fleece-0 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '99170614-5fbd-4a64-9706-e785ebc03f32'
INFO: resuming VM again
INFO: efidisk0: dirty-bitmap status: OK (drive clean)
INFO: scsi0: dirty-bitmap status: OK (296.0 MiB of 32.0 GiB dirty)
INFO: tpmstate0-backup: dirty-bitmap status: created new
INFO: using fast incremental mode (dirty-bitmap), 300.0 MiB dirty of 32.0 GiB total
INFO: 100% (300.0 MiB of 300.0 MiB) in 3s, read: 100.0 MiB/s, write: 94.7 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job

INFO: Starting Backup of VM 302 (qemu)
INFO: Backup started at 2025-09-06 09:12:28
INFO: status = running
INFO: VM Name: c2.cloud
INFO: include disk 'scsi0' 'ceph:vm-302-disk-1' 32G
INFO: include disk 'efidisk0' 'ceph:vm-302-disk-0' 528K
INFO: include disk 'tpmstate0' 'ceph:vm-302-disk-2' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/302/2025-09-06T07:12:28Z'
INFO: attaching TPM drive to QEMU for backup
trying to acquire cfs lock 'storage-ceph' ...
/dev/rbd5
INFO: drive-scsi0: attaching fleecing image ceph:vm-302-fleece-0 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '12324d0c-f399-4c45-81ba-4acfa84227ce'
INFO: resuming VM again
INFO: efidisk0: dirty-bitmap status: OK (drive clean)
INFO: scsi0: dirty-bitmap status: OK (312.0 MiB of 32.0 GiB dirty)
INFO: tpmstate0-backup: dirty-bitmap status: created new
INFO: using fast incremental mode (dirty-bitmap), 316.0 MiB dirty of 32.0 GiB total
INFO: 100% (316.0 MiB of 316.0 MiB) in 3s, read: 105.3 MiB/s, write: 102.7 MiB/s
ERROR: backup close image failed: command error: stream closed because of a broken pipe
INFO: aborting backup job
INFO: resuming VM again
trying to acquire cfs lock 'storage-ceph' ...
Removing image: 1% complete...
...
Removing image: 100% complete...done.
ERROR: Backup of VM 302 failed - backup close image failed: command error: stream closed because of a broken pipe
INFO: Failed at 2025-09-06 09:19:33



root@proxmox-3:~# pveversion --verbose
proxmox-ve: 8.3.0 (running kernel: 6.8.12-1-pve)
pve-manager: 8.3.2 (running version: 8.3.2/3e76eec21c4a14a7)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-15
proxmox-kernel-6.8: 6.8.12-5
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
pve-kernel-5.4: 6.4-12
pve-kernel-5.15.158-2-pve: 5.15.158-2
pve-kernel-5.4.162-1-pve: 5.4.162-2
ceph: 18.2.4-pve3
ceph-fuse: 18.2.4-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.3
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-2
pve-ha-manager: 4.0.6
pve-i18n: 3.3.2
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
pve-zsync: 2.3.1
qemu-server: 8.3.3
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
 
I have upgraded PVE on that cluster and I can't reproduce the issue anymore. The issue is still present, see #

was:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-1-pve),
proxmox-backup-client: 3.3.2-1

now: pve-manager: 8.4.12 (running version: 8.4.12/c2ea8261d32a5020),
proxmox-backup-client: 3.4.6-1

I'll keep the post open waiting for all my scheduled job to run tonight (concurrency seems to be key to trigger this behaviour) and update the post tomorrow.
 
Last edited:
I can confirm the freezing of the pbs.
I can backup succesfully vms with disksize of 32GB or smaller to the S3 datastore.
When I start a vm (actually a Windows 11 with 64GB disk size) it goes through all the way to 100% (shown in the taskviewer on the proxmox node), and then it freezes. When at the same time on the pbs I have the taskviewer open I see the chuncks passing by, but the moment 100% backup is reached and the node backup freezes, the chuncks list freezes as well. And also the dashboard freezes and times out. Only a reboot will return the dashboard. In pbs the backup shows up as size 0.

When I close the taskviewer on the pbs and re-open, it now shows:


Screenshot 2025-09-06 162335.jpg
 
Ok, addition. I set the option Skip If-None-Match header for the S3 I am using. And now I can backup the 64GB vm.
No more freeze. I will investigate further.
 
So the issues persist.

I currently have 3 cluster targetting 2 different datastore, both backed by different buckets on OVHCloud S3 Object Storage.

Cluster 1:
Code:
pve-manager: 8.4.5 (running version: 8.4.5/57892e8e686cb35b)
proxmox-backup-client: 3.4.3-1

Cluster 2:
Code:
pve-manager: 8.4.12 (running version: 8.4.12/c2ea8261d32a5020)
proxmox-backup-client: 3.4.6-1

Cluster 3:
Code:
pve-manager: 9.0.6 (running version: 9.0.6/49c767b70aeb6648)
proxmox-backup-client: 4.0.14-1


Is there a known incompatibility between proxmox-backup-client: 3.4.X and PBS 4 with S3 backed datastore?
The backup job on Cluster 3 (PVE9) work ok, but PBS freeze while a backup job is running on either Cluster 1 or Cluster 2.


Code:
Sep 10 00:27:02 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.015 seconds)
Sep 10 00:57:02 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.013 seconds)
Sep 10 01:27:02 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.016 seconds)
Sep 10 01:36:25 pbs proxmox-backup-proxy[21046]: Upload backup log to datastore 'DATASTORE_R', namespace 'clusters/Cluster3' ct/102/2025-09-09T23:35:01Z/client.log.blob
Sep 10 01:36:49 pbs proxmox-backup-proxy[21046]: Upload backup log to datastore 'DATASTORE_R', namespace 'clusters/Cluster3' ct/153/2025-09-09T23:36:26Z/client.log.blob
Sep 10 01:36:56 pbs proxmox-backup-proxy[21046]: Upload backup log to datastore 'DATASTORE_R', namespace 'clusters/Cluster3' vm/254/2025-09-09T23:36:49Z/client.log.blob
Sep 10 01:37:33 pbs proxmox-backup-proxy[21046]: Upload backup log to datastore 'DATASTORE_R', namespace 'clusters/Cluster3' ct/103/2025-09-09T23:35:01Z/client.log.blob
Sep 10 01:57:02 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.032 seconds)
Sep 10 02:27:03 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.017 seconds)
Sep 10 02:57:03 pbs proxmox-backup-proxy[21046]: rrd journal successfully committed (41 files in 0.045 seconds)
Sep 10 03:15:16 pbs proxmox-backup-proxy[21046]: Upload backup log to datastore DATASTORE_R', namespace 'clusters/Cluster2' ct/100/2025-09-10T01:15:01Z/client.log.blob
Sep 10 03:15:40 pbs proxmox-backup-proxy[21046]: Upload backup log to datastore 'DATASTORE_R', namespace 'clusters/Cluster2' ct/3001/2025-09-10T01:15:01Z/client.log.blob
Sep 10 11:13:31 pbs systemd[1]: Stopping proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 10 11:13:32 pbs systemd[1]: proxmox-backup-proxy.service: Deactivated successfully.
Sep 10 11:13:32 pbs systemd[1]: Stopped proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 10 11:13:32 pbs systemd[1]: proxmox-backup-proxy.service: Consumed 17h 4min 59.466s CPU time, 2.1G memory peak, 182.9M memory swap peak.
Sep 10 11:13:32 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 10 11:13:32 pbs proxmox-backup-proxy[83359]: catching shutdown signal
Sep 10 11:13:32 pbs proxmox-backup-proxy[83359]: catching reload signal
Sep 10 11:13:32 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 10 11:13:32 pbs proxmox-backup-proxy[83359]: Using datastore cache with capacity 6679 for store pbs.rqsco
Sep 10 11:13:32 pbs proxmox-backup-proxy[83359]: applied rrd journal (5404 entries in 0.411 seconds)
Sep 10 11:13:32 pbs proxmox-backup-proxy[83359]: rrd journal successfully committed (41 files in 0.056 seconds)
Sep 10 11:13:35 pbs proxmox-backup-proxy[83359]: Using datastore cache with capacity 5460 for store DATASTORE_D
Sep 10 11:43:32 pbs proxmox-backup-proxy[83359]: rrd journal successfully committed (41 files in 0.052 seconds)
Sep 10 12:13:00 pbs proxmox-backup-proxy[83359]: Using datastore cache with capacity 6679 for store DATASTORE_R
Sep 10 12:13:05 pbs proxmox-backup-proxy[83359]: Using datastore cache with capacity 5460 for store  DATASTORE_D
Sep 10 12:13:32 pbs proxmox-backup-proxy[83359]: rrd journal successfully committed (41 files in 0.045 seconds)

This is the only line in the log while PBS is freezed:
Sep 10 03:15:40 pbs proxmox-backup-proxy[21046]: Upload backup log to datastore 'DATASTORE_R', namespace 'clusters/Cluster2' ct/3001/2025-09-10T01:15:01Z/client.log.blob

I can't find anything else in the logs.
 
Again this morning. + I didn't restart proxy service until late yesterday to see if it would recover by itself but no.
 
I also have this issue. PBS freezes partway through. Below is the log of the sync job. The log just stops there.
I can see in S3 that some of the chunks made it, but not all. Only about 2.5 backups made it out of about 160ish.

- Skip if none match header on the s3 endpoint does not fix
- PBS 4.0.14


Code:
2025-09-12T08:31:03-05:00: Starting datastore sync job '-:pbs-backups:s3-pbs::s-1697889e-62d0'
2025-09-12T08:31:03-05:00: sync datastore 's3-pbs' from 'pbs-backups'
2025-09-12T08:31:03-05:00: ----
2025-09-12T08:31:03-05:00: Syncing datastore 'pbs-backups', root namespace into datastore 's3-pbs', root namespace
2025-09-12T08:31:03-05:00: found 16 groups to sync (out of 16 total)
2025-09-12T08:31:04-05:00: re-sync snapshot ct/100/2025-08-31T06:00:03Z
2025-09-12T08:31:04-05:00: no data changes
2025-09-12T08:31:04-05:00: percentage done: 0.62% (0/16 groups, 1/10 snapshots in group #1)
2025-09-12T08:31:04-05:00: re-sync snapshot ct/100/2025-09-05T06:00:02Z
2025-09-12T08:31:04-05:00: sync archive catalog.pcat1.didx
2025-09-12T08:31:04-05:00: downloaded 0 B (0 B/s)
2025-09-12T08:31:04-05:00: percentage done: 1.25% (0/16 groups, 2/10 snapshots in group #1)
2025-09-12T08:31:04-05:00: sync snapshot ct/100/2025-09-06T06:00:06Z
2025-09-12T08:31:04-05:00: sync archive pct.conf.blob
2025-09-12T08:31:04-05:00: sync archive root.pxar.didx
2025-09-12T08:31:09-05:00: downloaded 29.281 MiB (6.36 MiB/s)
2025-09-12T08:31:09-05:00: sync archive catalog.pcat1.didx


Maybe relevant? This might be a red-herring because I killed PBS in the middle of a sync job. I have set my s3 cache dir as `/etc/s3-cache`.
Code:
root@pbs:~# journalctl -xe | grep error
Sep 12 09:27:54 pbs proxmox-backup-proxy[218]: error during snapshot file listing: 'unable to load blob '"/etc/s3-cache/ct/100/2025-09-06T06:00:06Z/index.json.blob"' - No such file or directory (os error 2)'
 
Last edited:
There was a bug which leads to the proxmox-backup-proxy process hanging on backup task log upload (which happens after the 100%, therefore you see this as the last line in the task log). The bugfix has already been applied and will be packaged with proxmox-backup v5.0.15
 
Hi

Same behavior after upgrade to PBS 4.0.15


Code:
root@pbs:~#  proxmox-backup-manager versions --verbose
proxmox-backup                      4.0.0         running kernel: 6.14.11-1-pve
proxmox-backup-server               4.0.15-1      running version: 4.0.15    
proxmox-kernel-helper               9.0.4                                    
pve-kernel-5.11                     7.0-10                                    
proxmox-kernel-6.14.11-2-pve-signed 6.14.11-2                                
proxmox-kernel-6.14                 6.14.11-2                                
proxmox-kernel-6.14.11-1-pve-signed 6.14.11-1                                
proxmox-kernel-6.8.12-13-pve-signed 6.8.12-13                                
proxmox-kernel-6.8                  6.8.12-13                                
pve-kernel-5.13.19-2-pve            5.13.19-4                                
pve-kernel-5.11.22-7-pve            5.11.22-12                                
pve-kernel-5.11.22-1-pve            5.11.22-2                                
ifupdown2                           3.3.0-1+pmx10                            
libjs-extjs                         7.0.0-5                                  
proxmox-backup-docs                 4.0.15-1                                  
proxmox-backup-client               4.0.15-1                                  
proxmox-mail-forward                1.0.2                                    
proxmox-mini-journalreader          1.6                                      
proxmox-offline-mirror-helper       0.7.2                                    
proxmox-widget-toolkit              5.0.5                                    
pve-xtermjs                         5.5.0-2                                  
smartmontools                       7.4-pve1                                  
zfsutils-linux                      2.3.4-pve1

Code:
Sep 19 08:18:39 pbs proxmox-backup-proxy[217464]: Upload backup log to datastore 'DATASTORE_R', namespace 'clusters/NS1' ct/9253/2025-09-19T06:17:35Z/clie>
Sep 19 13:19:14 pbs systemd[1]: proxmox-backup-proxy.service: Killing process 217464 (proxmox-backup-) with signal SIGKILL.
Sep 19 13:19:14 pbs systemd[1]: proxmox-backup-proxy.service: Killing process 227955 (tokio-runtime-w) with signal SIGKILL.
Sep 19 13:19:14 pbs systemd[1]: proxmox-backup-proxy.service: Killing process 228051 (tokio-runtime-w) with signal SIGKILL.
Sep 19 13:19:14 pbs systemd[1]: Stopping proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 19 13:19:14 pbs systemd[1]: proxmox-backup-proxy.service: Deactivated successfully.
Sep 19 13:19:14 pbs systemd[1]: Stopped proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.
Sep 19 13:19:14 pbs systemd[1]: proxmox-backup-proxy.service: Consumed 1d 4h 10min 53.997s CPU time, 2G memory peak, 226.1M memory swap peak.
Sep 19 13:19:14 pbs systemd[1]: Starting proxmox-backup-proxy.service - Proxmox Backup API Proxy Server...
Sep 19 13:19:14 pbs proxmox-backup-proxy[230903]: catching shutdown signal
Sep 19 13:19:14 pbs proxmox-backup-proxy[230903]: catching reload signal
Sep 19 13:19:14 pbs systemd[1]: Started proxmox-backup-proxy.service - Proxmox Backup API Proxy Server.


What surpises me is that I tested CleverCloud S3 storage and I didn't get any freeze/crash and now that I come back to OVHCloud S3, the freeze and crash start agin. One of the difference, is that I had to enable the provider quirks "Skip If-None-Match header" for OVH.

Also, I already tried this the put-rate-limit and it is currently set to 3.

Code:
s3-endpoint: OVHCloud
    access-key XX
    endpoint {{bucket}}.s3.{{region}}.io.cloud.ovh.net
    provider-quirks skip-if-none-match-header
    put-rate-limit 3
    region waw
    secret-key YY

Is there a way to get more verboses logs to understand what is happening ? I want to help!
 
Last edited:
Hi, please try to generate an strace as described here [0] and post the VM config qm config <VMID> --current.

[0] https://forum.proxmox.com/threads/s3-endpoint-with-google-cloud-storage.171814/post-802980

Hi Chris

Okay, so PBS is freezed again this morning.

There's multiple VM that were in the process of being backuped when it freeze and the backup job on VE is currently freeze as well:
Cluster 1, VM 116 - Stuck at 42%
Cluster 1, VM 201 - Stuck at 100%

S3 cache is a ZFS dataset, the local storage of the host is NVMe backed also by ZFS.

Code:
root@pve5:~# qm config 201 --current
agent: 1
balloon: 4096
bios: ovmf
boot: order=scsi0
cores: 2
cpu: host,flags=+aes
cpuunits: 5000
efidisk0: rpool:vm-201-disk-1,efitype=4m,pre-enrolled-keys=1,size=528K
hotplug: 0
localtime: 0
lock: backup
machine: pc-q35-9.2
memory: 8192
meta: creation-qemu=7.2.0,ctime=1699154588
name: winadm
net1: virtio=72:80:E9:E2:9C:E1,bridge=vmbr1
numa: 0
onboot: 1
ostype: win11
parent: OK
scsi0: rpool:vm-201-disk-2,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=675eafef-b054-4b16-af3c-27a2cb32500d
sockets: 1
spice_enhancements: videostreaming=all
startup: order=2,up=30,down=30
tablet: 0
tpmstate0: rpool:vm-201-disk-0,size=4M,version=v2.0
vga: std,memory=128
vmgenid: d8605553-faed-4474-ad6b-c71ed7f1ea04


root@perseverance:~# qm config 116 --current
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0
cores: 4
cpu: x86-64-v2-AES,flags=+aes
cpuunits: 10000
hotplug: 0
keyboard: fr-be
lock: backup
memory: 32768
name: XX
net0: virtio=02:00:00:6d:5b:ad,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: rpool:vm-116-disk-0,discard=on,iothread=1,size=1000G
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=ebceb126-a109-4e08-ad84-abfec052666b
sockets: 1
startup: order=1,up=30,down=90
tablet: 0


root@pbs:~# strace -ftt  -p $(pidof proxmox-backup-proxy)
strace: Process 144871 attached with 9 threads
[pid 176771] 10:07:00.586162 futex(0x7dc9200becf0, FUTEX_WAIT_BITSET_PRIVATE, 2, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 176770] 10:07:00.586492 futex(0x7dc9500022a8, FUTEX_WAIT_BITSET_PRIVATE, 4294967295, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 176759] 10:07:00.586516 futex(0x7dc9200becf0, FUTEX_WAIT_BITSET_PRIVATE, 2, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 176738] 10:07:00.586534 futex(0x7dc9200becf0, FUTEX_WAIT_BITSET_PRIVATE, 2, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 171289] 10:07:00.586550 futex(0x7dc9200becf0, FUTEX_WAIT_BITSET_PRIVATE, 2, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 145002] 10:07:00.586567 futex(0x7dc9200becf0, FUTEX_WAIT_BITSET_PRIVATE, 2, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 144879] 10:07:00.586587 restart_syscall(<... resuming interrupted clock_nanosleep ...> <unfinished ...>
[pid 144871] 10:07:00.586601 futex(0x7dc96a619c28, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 147558] 10:07:00.586621 futex(0x7dc9200becf0, FUTEX_WAIT_BITSET_PRIVATE, 2, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 144879] 10:07:02.409980 <... restart_syscall resumed>) = 0
[pid 144879] 10:07:02.410049 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:05.410587 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:08.410987 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:11.411363 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:14.411724 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:17.412163 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:20.412617 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:23.413169 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:26.413659 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:29.414044 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:32.414616 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:35.414946 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:38.415512 mprotect(0x7dc9481f2000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 144879] 10:07:38.415693 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:41.416062 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:44.416439 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:47.417035 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:50.417561 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:53.417927 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:56.418359 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:07:59.418787 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:08:02.419161 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:08:05.419536 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:08:08.419967 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:08:11.420371 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:08:14.420824 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, 0x7dc9699fa7c0) = 0
[pid 144879] 10:08:17.421307 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3, tv_nsec=0}, ^Cstrace: Process 176770 detached
strace: Process 176771 detached
strace: Process 176759 detached
strace: Process 176738 detached
strace: Process 171289 detached
strace: Process 147558 detached
strace: Process 145002 detached
strace: Process 144879 detached
 <detached ...>
strace: Process 144871 detached

root@pbs:~# proxmox-backup-manager version --verbose
proxmox-backup                      4.0.0         running kernel: 6.14.11-2-pve
proxmox-backup-server               4.0.15-1      running version: 4.0.15
proxmox-kernel-helper               9.0.4
pve-kernel-5.11                     7.0-10
proxmox-kernel-6.14.11-2-pve-signed 6.14.11-2
proxmox-kernel-6.14                 6.14.11-2
proxmox-kernel-6.14.11-1-pve-signed 6.14.11-1
proxmox-kernel-6.8.12-13-pve-signed 6.8.12-13
proxmox-kernel-6.8                  6.8.12-13
pve-kernel-5.13.19-2-pve            5.13.19-4
pve-kernel-5.11.22-7-pve            5.11.22-12
pve-kernel-5.11.22-1-pve            5.11.22-2
ifupdown2                           3.3.0-1+pmx10
libjs-extjs                         7.0.0-5
proxmox-backup-docs                 4.0.15-1
proxmox-backup-client               4.0.15-1
proxmox-mail-forward                1.0.2
proxmox-mini-journalreader          1.6
proxmox-offline-mirror-helper       0.7.2
proxmox-widget-toolkit              5.0.5
pve-xtermjs                         5.5.0-2
smartmontools                       7.4-pve1
zfsutils-linux                      2.3.4-pve1


root@pbs:~# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
s3-cache   254G  82.3G   172G        -         -    35%    32%  1.00x    ONLINE  -


root@pbs:/var/log/proxmox-backup/tasks# cat /var/log/proxmox-backup/tasks/active
UPID:pbs:000235E7:016B95A1:00000032:68D1F497:backup:pbs.rqsco\x3avm-116:cluster_dNet@pbs:
UPID:pbs:000235E7:016B95A1:00000033:68D1F498:backup:pbs.rqsco\x3avm-201:cluster_dNet@pbs:

Logs attached.

I'll run some manual backup and try to get it freeze while running the strace.
 

Attachments

Last edited:
Thanks a lot for the strace output. Can you also generate a backtrace of all threads in the stuck state. For this, please install gdb and the PBS debug symbols via apt install gdb proxmox-backup-sever-dbgsym and run gdb --batch --ex 't a a bt' -p $(pidof proxmox-backup-proxy) > proxy.backtrace
 
Thanks a lot for the strace output. Can you also generate a backtrace of all threads in the stuck state. For this, please install gdb and the PBS debug symbols via apt install gdb proxmox-backup-sever-dbgsym and run gdb --batch --ex 't a a bt' -p $(pidof proxmox-backup-proxy) > proxy.backtrace
There you go. Hope this help.
 

Attachments