[Backups] hang

hac3ru · 2025-12-17T08:03:43+0100

Hello,

I have an issue that seems to have started after updating to PBS 4.1.0. Backups randomly fail with no apparent reason. PVE Backup logs:

Code:

INFO: starting new backup job: vzdump --pool CICD --fleecing 0 --mode snapshot --storage PBE_Long_Term --mailnotification always --mailto <email> --quiet 1 --notes-template '{{vmid}}-{{guestname}}' --notification-mode notification-system
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2025-12-16 22:00:03
INFO: status = running
INFO: VM Name: jenkins-worker-01.infra
INFO: include disk 'scsi0' 'Local_23TB:vm-101-disk-0' 300G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/101/2025-12-16T21:00:03Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '369c8266-31d6-4ad4-82cc-58fc025bcb02'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (253.8 GiB of 300.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 253.8 GiB dirty of 300.0 GiB total
INFO:   0% (316.0 MiB of 253.8 GiB) in 3s, read: 105.3 MiB/s, write: 102.7 MiB/s
INFO:   1% (2.5 GiB of 253.8 GiB) in 1m 20s, read: 29.7 MiB/s, write: 29.7 MiB/s
INFO:   2% (5.1 GiB of 253.8 GiB) in 2m 43s, read: 31.6 MiB/s, write: 31.6 MiB/s
INFO:   3% (7.6 GiB of 253.8 GiB) in 4m 11s, read: 29.4 MiB/s, write: 29.4 MiB/s
INFO:   4% (10.2 GiB of 253.8 GiB) in 5m 36s, read: 31.0 MiB/s, write: 31.0 MiB/s
INFO:   5% (12.8 GiB of 253.8 GiB) in 7m 44s, read: 20.4 MiB/s, write: 20.4 MiB/s
INFO:   6% (15.3 GiB of 253.8 GiB) in 9m 8s, read: 30.9 MiB/s, write: 30.9 MiB/s
INFO:   7% (17.8 GiB of 253.8 GiB) in 10m 20s, read: 35.6 MiB/s, write: 35.6 MiB/s
INFO:   8% (20.3 GiB of 253.8 GiB) in 11m, read: 64.8 MiB/s, write: 64.8 MiB/s
INFO:   9% (22.9 GiB of 253.8 GiB) in 11m 33s, read: 80.1 MiB/s, write: 80.1 MiB/s
INFO:  10% (25.4 GiB of 253.8 GiB) in 12m 2s, read: 89.0 MiB/s, write: 89.0 MiB/s
INFO:  11% (27.9 GiB of 253.8 GiB) in 12m 33s, read: 83.2 MiB/s, write: 83.2 MiB/s
INFO:  12% (30.5 GiB of 253.8 GiB) in 13m 8s, read: 75.8 MiB/s, write: 75.5 MiB/s
INFO:  13% (33.1 GiB of 253.8 GiB) in 13m 43s, read: 74.1 MiB/s, write: 74.1 MiB/s
INFO:  14% (35.6 GiB of 253.8 GiB) in 14m 15s, read: 80.6 MiB/s, write: 80.5 MiB/s
INFO:  15% (38.1 GiB of 253.8 GiB) in 14m 46s, read: 83.6 MiB/s, write: 83.6 MiB/s
INFO:  16% (40.6 GiB of 253.8 GiB) in 15m 43s, read: 44.9 MiB/s, write: 44.9 MiB/s
INFO:  17% (43.2 GiB of 253.8 GiB) in 17m 3s, read: 33.3 MiB/s, write: 33.2 MiB/s
INFO:  18% (45.7 GiB of 253.8 GiB) in 18m 29s, read: 29.4 MiB/s, write: 29.4 MiB/s
INFO:  19% (48.2 GiB of 253.8 GiB) in 20m 6s, read: 26.8 MiB/s, write: 26.8 MiB/s
INFO:  20% (50.8 GiB of 253.8 GiB) in 21m 38s, read: 28.5 MiB/s, write: 28.5 MiB/s
INFO:  21% (53.3 GiB of 253.8 GiB) in 22m 56s, read: 33.3 MiB/s, write: 33.3 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 101 failed - interrupted by signal
INFO: Failed at 2025-12-17 07:51:36
ERROR: Backup job failed - interrupted by signal
INFO: skipping disabled matcher 'default-matcher'
INFO: notified via target `admin-emails`
TASK ERROR: interrupted by signal

This happens almost nightly, to different VMs, and since a single backup can run at once on a host, it stops the other backups from doing their job.
The PBS CPU and memory utilization are sitting at 0-38% (for the CPU) and 18-35% (for the memory).

The PBS is storing the backups on an external NFS drive. The NFS is fine, as other systems use it and have no problem with it.
`dmesg -T` reveals nothing on the PBS (last entry is from December 7th) so I have no idea what's going on.

The PVE hosts are running 8.4.0 and 8.4.5 (I know, update is planned, but 8.4 is still supported).

Any help is greatly appreciated.

Chris · 2025-12-17T09:08:11+0100

Hi,
you are most likely affected by the kernel bug currently under investigation [0]. Please try to install the test kernels [1,2] and see if one of them fixes the issue for you. Further, please try to produce some debug output when the stall occurs [3] and run nstat && ss -tim while it hangs.

If the issue persists, you might revert to an older kernel version for the time being.

[0] https://forum.proxmox.com/threads/176444/
[1] https://forum.proxmox.com/threads/176444/post-824058
[2] https://forum.proxmox.com/threads/176444/post-825297
[3] https://forum.proxmox.com/threads/176444/post-825187

Search

Search

[Backups] hang

hac3ru

Active Member

Chris

Proxmox Staff Member

We value your privacy