[Backups] hang

hac3ru

Active Member
Mar 6, 2021
55
2
28
34
Hello,

I have an issue that seems to have started after updating to PBS 4.1.0. Backups randomly fail with no apparent reason. PVE Backup logs:
Code:
INFO: starting new backup job: vzdump --pool CICD --fleecing 0 --mode snapshot --storage PBE_Long_Term --mailnotification always --mailto <email> --quiet 1 --notes-template '{{vmid}}-{{guestname}}' --notification-mode notification-system
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2025-12-16 22:00:03
INFO: status = running
INFO: VM Name: jenkins-worker-01.infra
INFO: include disk 'scsi0' 'Local_23TB:vm-101-disk-0' 300G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/101/2025-12-16T21:00:03Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '369c8266-31d6-4ad4-82cc-58fc025bcb02'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (253.8 GiB of 300.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 253.8 GiB dirty of 300.0 GiB total
INFO:   0% (316.0 MiB of 253.8 GiB) in 3s, read: 105.3 MiB/s, write: 102.7 MiB/s
INFO:   1% (2.5 GiB of 253.8 GiB) in 1m 20s, read: 29.7 MiB/s, write: 29.7 MiB/s
INFO:   2% (5.1 GiB of 253.8 GiB) in 2m 43s, read: 31.6 MiB/s, write: 31.6 MiB/s
INFO:   3% (7.6 GiB of 253.8 GiB) in 4m 11s, read: 29.4 MiB/s, write: 29.4 MiB/s
INFO:   4% (10.2 GiB of 253.8 GiB) in 5m 36s, read: 31.0 MiB/s, write: 31.0 MiB/s
INFO:   5% (12.8 GiB of 253.8 GiB) in 7m 44s, read: 20.4 MiB/s, write: 20.4 MiB/s
INFO:   6% (15.3 GiB of 253.8 GiB) in 9m 8s, read: 30.9 MiB/s, write: 30.9 MiB/s
INFO:   7% (17.8 GiB of 253.8 GiB) in 10m 20s, read: 35.6 MiB/s, write: 35.6 MiB/s
INFO:   8% (20.3 GiB of 253.8 GiB) in 11m, read: 64.8 MiB/s, write: 64.8 MiB/s
INFO:   9% (22.9 GiB of 253.8 GiB) in 11m 33s, read: 80.1 MiB/s, write: 80.1 MiB/s
INFO:  10% (25.4 GiB of 253.8 GiB) in 12m 2s, read: 89.0 MiB/s, write: 89.0 MiB/s
INFO:  11% (27.9 GiB of 253.8 GiB) in 12m 33s, read: 83.2 MiB/s, write: 83.2 MiB/s
INFO:  12% (30.5 GiB of 253.8 GiB) in 13m 8s, read: 75.8 MiB/s, write: 75.5 MiB/s
INFO:  13% (33.1 GiB of 253.8 GiB) in 13m 43s, read: 74.1 MiB/s, write: 74.1 MiB/s
INFO:  14% (35.6 GiB of 253.8 GiB) in 14m 15s, read: 80.6 MiB/s, write: 80.5 MiB/s
INFO:  15% (38.1 GiB of 253.8 GiB) in 14m 46s, read: 83.6 MiB/s, write: 83.6 MiB/s
INFO:  16% (40.6 GiB of 253.8 GiB) in 15m 43s, read: 44.9 MiB/s, write: 44.9 MiB/s
INFO:  17% (43.2 GiB of 253.8 GiB) in 17m 3s, read: 33.3 MiB/s, write: 33.2 MiB/s
INFO:  18% (45.7 GiB of 253.8 GiB) in 18m 29s, read: 29.4 MiB/s, write: 29.4 MiB/s
INFO:  19% (48.2 GiB of 253.8 GiB) in 20m 6s, read: 26.8 MiB/s, write: 26.8 MiB/s
INFO:  20% (50.8 GiB of 253.8 GiB) in 21m 38s, read: 28.5 MiB/s, write: 28.5 MiB/s
INFO:  21% (53.3 GiB of 253.8 GiB) in 22m 56s, read: 33.3 MiB/s, write: 33.3 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 101 failed - interrupted by signal
INFO: Failed at 2025-12-17 07:51:36
ERROR: Backup job failed - interrupted by signal
INFO: skipping disabled matcher 'default-matcher'
INFO: notified via target `admin-emails`
TASK ERROR: interrupted by signal
This happens almost nightly, to different VMs, and since a single backup can run at once on a host, it stops the other backups from doing their job.
The PBS CPU and memory utilization are sitting at 0-38% (for the CPU) and 18-35% (for the memory).

The PBS is storing the backups on an external NFS drive. The NFS is fine, as other systems use it and have no problem with it.
`dmesg -T` reveals nothing on the PBS (last entry is from December 7th) so I have no idea what's going on.

The PVE hosts are running 8.4.0 and 8.4.5 (I know, update is planned, but 8.4 is still supported).

Any help is greatly appreciated.
 
Hi,
you are most likely affected by the kernel bug currently under investigation [0]. Please try to install the test kernels [1,2] and see if one of them fixes the issue for you. Further, please try to produce some debug output when the stall occurs [3] and run nstat && ss -tim while it hangs.

If the issue persists, you might revert to an older kernel version for the time being.

[0] https://forum.proxmox.com/threads/176444/
[1] https://forum.proxmox.com/threads/176444/post-824058
[2] https://forum.proxmox.com/threads/176444/post-825297
[3] https://forum.proxmox.com/threads/176444/post-825187