PVE9 VM suddenly stops when finishing backup to PBS

michalk

Member
Jan 14, 2022
5
0
21
44
I have 2 hosts with PVE8 and one with PVE9. Problem is only on PVE9 (9.1.4) and occurs after ~10-12 days from last VM start.

Every day there are 4 backups of VM (schedule: 5,11,17,23:15) which are done from PVE to PBS. Everything works and then after 10+ days suddenly VM stops working at 99% of backup. I have to manully start VM again and everything is back to normal for next 10+ days. There is no error on VM - it is just stopped. This happened to me already 2 times.

In backup logs it looks like this:

Bash:
INFO: starting new backup job: vzdump 133 --mailto ****@**** --bwlimit 153600 --mode snapshot --node p** --notification-mode legacy-sendmail --fleecing '1,storage=local' --storage backup-******* --quiet 1 --notes-template '{{guestname}}' --mailnotification failure --prune-backups 'keep-last=4'
INFO: Starting Backup of VM 133 (qemu)
INFO: Backup started at 2026-04-11 23:15:01
INFO: status = running
INFO: VM Name: ****************
INFO: include disk 'scsi0' 'local:133/vm-133-disk-0.raw' 1900G
INFO: include disk 'scsi1' 'local:133/vm-133-disk-1.raw' 1100G
INFO: backup mode: snapshot
INFO: bandwidth limit: 153600 KiB/s
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/133/2026-04-11T21:15:01Z'
Formatting '/var/lib/vz/images/133/vm-133-fleece-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=2040109465600 lazy_refcounts=off refcount_bits=16
Formatting '/var/lib/vz/images/133/vm-133-fleece-1.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=1181116006400 lazy_refcounts=off refcount_bits=16
INFO: drive-scsi0: attaching fleecing image local:133/vm-133-fleece-0.qcow2 to QEMU
INFO: drive-scsi1: attaching fleecing image local:133/vm-133-fleece-1.qcow2 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '************************************'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (63.3 GiB of 1.9 TiB dirty)
INFO: scsi1: dirty-bitmap status: OK (27.7 GiB of 1.1 TiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 91.1 GiB dirty of 2.9 TiB total
INFO:   0% (448.0 MiB of 91.1 GiB) in 3s, read: 149.3 MiB/s, write: 146.7 MiB/s
INFO:   1% (1.0 GiB of 91.1 GiB) in 7s, read: 148.0 MiB/s, write: 148.0 MiB/s
INFO:   2% (1.9 GiB of 91.1 GiB) in 13s, read: 146.7 MiB/s, write: 143.3 MiB/s
INFO:   3% (2.8 GiB of 91.1 GiB) in 19s, read: 149.3 MiB/s, write: 146.0 MiB/s
INFO:   4% (3.8 GiB of 91.1 GiB) in 26s, read: 148.6 MiB/s, write: 147.4 MiB/s
INFO:   5% (4.6 GiB of 91.1 GiB) in 32s, read: 149.3 MiB/s, write: 148.0 MiB/s
INFO:   6% (5.5 GiB of 91.1 GiB) in 38s, read: 149.3 MiB/s, write: 141.3 MiB/s
INFO:   7% (6.4 GiB of 91.1 GiB) in 44s, read: 149.3 MiB/s, write: 129.3 MiB/s
INFO:   8% (7.4 GiB of 91.1 GiB) in 51s, read: 148.6 MiB/s, write: 145.7 MiB/s
INFO:   9% (8.3 GiB of 91.1 GiB) in 57s, read: 149.3 MiB/s, write: 149.3 MiB/s
INFO:  10% (9.2 GiB of 91.1 GiB) in 1m 3s, read: 149.3 MiB/s, write: 148.7 MiB/s
INFO:  11% (10.0 GiB of 91.1 GiB) in 1m 9s, read: 149.3 MiB/s, write: 148.7 MiB/s
INFO:  12% (11.1 GiB of 91.1 GiB) in 1m 16s, read: 150.3 MiB/s, write: 148.0 MiB/s
INFO:  13% (11.9 GiB of 91.1 GiB) in 1m 22s, read: 147.3 MiB/s, write: 146.0 MiB/s
INFO:  14% (12.8 GiB of 91.1 GiB) in 1m 29s, read: 128.0 MiB/s, write: 127.4 MiB/s
INFO:  15% (13.7 GiB of 91.1 GiB) in 1m 35s, read: 149.3 MiB/s, write: 149.3 MiB/s
INFO:  16% (14.7 GiB of 91.1 GiB) in 1m 42s, read: 150.3 MiB/s, write: 150.3 MiB/s
INFO:  17% (15.6 GiB of 91.1 GiB) in 1m 48s, read: 149.3 MiB/s, write: 148.7 MiB/s
INFO:  18% (16.5 GiB of 91.1 GiB) in 1m 54s, read: 150.0 MiB/s, write: 150.0 MiB/s
INFO:  19% (17.3 GiB of 91.1 GiB) in 2m, read: 149.3 MiB/s, write: 148.7 MiB/s
INFO:  20% (18.3 GiB of 91.1 GiB) in 2m 7s, read: 149.1 MiB/s, write: 148.6 MiB/s
INFO:  21% (19.2 GiB of 91.1 GiB) in 2m 13s, read: 151.3 MiB/s, write: 151.3 MiB/s
INFO:  22% (20.1 GiB of 91.1 GiB) in 2m 19s, read: 148.7 MiB/s, write: 147.3 MiB/s
INFO:  23% (21.0 GiB of 91.1 GiB) in 2m 25s, read: 150.0 MiB/s, write: 150.0 MiB/s
INFO:  24% (21.9 GiB of 91.1 GiB) in 2m 31s, read: 149.3 MiB/s, write: 148.0 MiB/s
INFO:  25% (22.9 GiB of 91.1 GiB) in 2m 38s, read: 149.7 MiB/s, write: 149.7 MiB/s
INFO:  26% (23.8 GiB of 91.1 GiB) in 2m 44s, read: 150.7 MiB/s, write: 150.7 MiB/s
INFO:  27% (24.6 GiB of 91.1 GiB) in 2m 50s, read: 149.3 MiB/s, write: 149.3 MiB/s
INFO:  28% (25.5 GiB of 91.1 GiB) in 2m 56s, read: 150.0 MiB/s, write: 150.0 MiB/s
INFO:  29% (26.5 GiB of 91.1 GiB) in 3m 3s, read: 150.3 MiB/s, write: 150.3 MiB/s
INFO:  30% (27.4 GiB of 91.1 GiB) in 3m 9s, read: 150.0 MiB/s, write: 150.0 MiB/s
INFO:  31% (28.3 GiB of 91.1 GiB) in 3m 15s, read: 150.0 MiB/s, write: 143.3 MiB/s
INFO:  32% (29.2 GiB of 91.1 GiB) in 3m 21s, read: 149.3 MiB/s, write: 144.0 MiB/s
INFO:  33% (30.2 GiB of 91.1 GiB) in 3m 28s, read: 148.6 MiB/s, write: 147.4 MiB/s
INFO:  34% (31.1 GiB of 91.1 GiB) in 3m 34s, read: 149.3 MiB/s, write: 146.0 MiB/s
INFO:  35% (31.9 GiB of 91.1 GiB) in 3m 40s, read: 149.3 MiB/s, write: 143.3 MiB/s
INFO:  36% (32.8 GiB of 91.1 GiB) in 3m 46s, read: 148.7 MiB/s, write: 145.3 MiB/s
INFO:  37% (33.8 GiB of 91.1 GiB) in 3m 53s, read: 147.4 MiB/s, write: 145.7 MiB/s
INFO:  38% (34.7 GiB of 91.1 GiB) in 3m 59s, read: 148.7 MiB/s, write: 147.3 MiB/s
INFO:  39% (35.6 GiB of 91.1 GiB) in 4m 5s, read: 149.3 MiB/s, write: 147.3 MiB/s
INFO:  40% (36.4 GiB of 91.1 GiB) in 4m 11s, read: 149.3 MiB/s, write: 143.3 MiB/s
INFO:  41% (37.5 GiB of 91.1 GiB) in 4m 18s, read: 149.1 MiB/s, write: 138.9 MiB/s
INFO:  42% (38.3 GiB of 91.1 GiB) in 4m 24s, read: 148.7 MiB/s, write: 147.3 MiB/s
INFO:  43% (39.2 GiB of 91.1 GiB) in 4m 30s, read: 149.3 MiB/s, write: 136.7 MiB/s
INFO:  44% (40.1 GiB of 91.1 GiB) in 4m 36s, read: 149.3 MiB/s, write: 141.3 MiB/s
INFO:  45% (41.1 GiB of 91.1 GiB) in 4m 43s, read: 148.6 MiB/s, write: 139.4 MiB/s
INFO:  46% (42.0 GiB of 91.1 GiB) in 4m 49s, read: 149.3 MiB/s, write: 132.7 MiB/s
INFO:  47% (42.8 GiB of 91.1 GiB) in 4m 55s, read: 148.0 MiB/s, write: 135.3 MiB/s
INFO:  48% (43.7 GiB of 91.1 GiB) in 5m 1s, read: 150.0 MiB/s, write: 126.0 MiB/s
INFO:  49% (44.7 GiB of 91.1 GiB) in 5m 8s, read: 149.1 MiB/s, write: 121.1 MiB/s
INFO:  50% (45.6 GiB of 91.1 GiB) in 5m 14s, read: 149.3 MiB/s, write: 124.0 MiB/s
INFO:  51% (46.5 GiB of 91.1 GiB) in 5m 20s, read: 148.7 MiB/s, write: 123.3 MiB/s
INFO:  52% (47.4 GiB of 91.1 GiB) in 5m 26s, read: 148.7 MiB/s, write: 130.7 MiB/s
INFO:  53% (48.4 GiB of 91.1 GiB) in 5m 33s, read: 149.7 MiB/s, write: 133.7 MiB/s
INFO:  54% (49.3 GiB of 91.1 GiB) in 5m 39s, read: 149.3 MiB/s, write: 120.7 MiB/s
INFO:  55% (50.1 GiB of 91.1 GiB) in 5m 45s, read: 148.7 MiB/s, write: 131.3 MiB/s
INFO:  56% (51.1 GiB of 91.1 GiB) in 5m 52s, read: 149.1 MiB/s, write: 127.4 MiB/s
INFO:  57% (52.0 GiB of 91.1 GiB) in 5m 58s, read: 146.7 MiB/s, write: 128.7 MiB/s
INFO:  58% (52.9 GiB of 91.1 GiB) in 6m 4s, read: 148.7 MiB/s, write: 124.7 MiB/s
INFO:  59% (53.8 GiB of 91.1 GiB) in 6m 10s, read: 150.0 MiB/s, write: 136.0 MiB/s
INFO:  60% (54.8 GiB of 91.1 GiB) in 6m 17s, read: 149.7 MiB/s, write: 122.3 MiB/s
INFO:  61% (55.6 GiB of 91.1 GiB) in 6m 23s, read: 148.0 MiB/s, write: 132.7 MiB/s
INFO:  62% (56.5 GiB of 91.1 GiB) in 6m 29s, read: 149.3 MiB/s, write: 131.3 MiB/s
INFO:  63% (57.4 GiB of 91.1 GiB) in 6m 35s, read: 147.3 MiB/s, write: 127.3 MiB/s
INFO:  64% (58.4 GiB of 91.1 GiB) in 6m 42s, read: 150.3 MiB/s, write: 132.6 MiB/s
INFO:  65% (59.3 GiB of 91.1 GiB) in 6m 48s, read: 149.3 MiB/s, write: 125.3 MiB/s
INFO:  66% (60.2 GiB of 91.1 GiB) in 6m 54s, read: 149.3 MiB/s, write: 112.0 MiB/s
INFO:  67% (61.0 GiB of 91.1 GiB) in 7m, read: 149.3 MiB/s, write: 129.3 MiB/s
INFO:  68% (62.1 GiB of 91.1 GiB) in 7m 7s, read: 148.6 MiB/s, write: 141.1 MiB/s
INFO:  69% (62.9 GiB of 91.1 GiB) in 7m 13s, read: 150.0 MiB/s, write: 141.3 MiB/s
INFO:  70% (63.8 GiB of 91.1 GiB) in 7m 19s, read: 149.3 MiB/s, write: 143.3 MiB/s
INFO:  71% (64.7 GiB of 91.1 GiB) in 7m 25s, read: 149.3 MiB/s, write: 133.3 MiB/s
INFO:  72% (65.7 GiB of 91.1 GiB) in 7m 32s, read: 150.3 MiB/s, write: 126.9 MiB/s
INFO:  73% (66.6 GiB of 91.1 GiB) in 7m 38s, read: 149.3 MiB/s, write: 132.7 MiB/s
INFO:  74% (67.5 GiB of 91.1 GiB) in 7m 44s, read: 149.3 MiB/s, write: 138.0 MiB/s
INFO:  75% (68.3 GiB of 91.1 GiB) in 7m 50s, read: 149.3 MiB/s, write: 142.0 MiB/s
INFO:  76% (69.2 GiB of 91.1 GiB) in 7m 56s, read: 150.0 MiB/s, write: 140.7 MiB/s
INFO:  77% (70.2 GiB of 91.1 GiB) in 8m 3s, read: 150.3 MiB/s, write: 137.1 MiB/s
INFO:  78% (71.1 GiB of 91.1 GiB) in 8m 9s, read: 149.3 MiB/s, write: 140.0 MiB/s
INFO:  79% (72.0 GiB of 91.1 GiB) in 8m 15s, read: 148.7 MiB/s, write: 141.3 MiB/s
INFO:  80% (72.9 GiB of 91.1 GiB) in 8m 21s, read: 148.7 MiB/s, write: 142.7 MiB/s
INFO:  81% (73.9 GiB of 91.1 GiB) in 8m 28s, read: 149.7 MiB/s, write: 142.9 MiB/s
INFO:  82% (74.8 GiB of 91.1 GiB) in 8m 34s, read: 150.0 MiB/s, write: 142.0 MiB/s
INFO:  83% (75.6 GiB of 91.1 GiB) in 8m 40s, read: 150.7 MiB/s, write: 144.0 MiB/s
INFO:  84% (76.5 GiB of 91.1 GiB) in 8m 46s, read: 147.3 MiB/s, write: 141.3 MiB/s
INFO:  85% (77.5 GiB of 91.1 GiB) in 8m 53s, read: 150.3 MiB/s, write: 149.7 MiB/s
INFO:  86% (78.4 GiB of 91.1 GiB) in 8m 59s, read: 148.7 MiB/s, write: 139.3 MiB/s
INFO:  87% (79.3 GiB of 91.1 GiB) in 9m 5s, read: 150.0 MiB/s, write: 150.0 MiB/s
INFO:  88% (80.2 GiB of 91.1 GiB) in 9m 11s, read: 150.0 MiB/s, write: 137.3 MiB/s
INFO:  89% (81.2 GiB of 91.1 GiB) in 9m 18s, read: 148.6 MiB/s, write: 138.3 MiB/s
INFO:  90% (82.1 GiB of 91.1 GiB) in 9m 24s, read: 149.3 MiB/s, write: 144.7 MiB/s
INFO:  91% (82.9 GiB of 91.1 GiB) in 9m 30s, read: 149.3 MiB/s, write: 148.7 MiB/s
INFO:  92% (83.8 GiB of 91.1 GiB) in 9m 36s, read: 150.0 MiB/s, write: 147.3 MiB/s
INFO:  93% (84.8 GiB of 91.1 GiB) in 9m 43s, read: 149.1 MiB/s, write: 145.1 MiB/s
INFO:  94% (85.7 GiB of 91.1 GiB) in 9m 50s, read: 128.0 MiB/s, write: 126.9 MiB/s
INFO:  95% (86.6 GiB of 91.1 GiB) in 9m 56s, read: 148.0 MiB/s, write: 148.0 MiB/s
INFO:  96% (87.4 GiB of 91.1 GiB) in 10m 2s, read: 149.3 MiB/s, write: 148.7 MiB/s
INFO:  97% (88.5 GiB of 91.1 GiB) in 10m 9s, read: 148.6 MiB/s, write: 148.6 MiB/s
INFO:  98% (89.3 GiB of 91.1 GiB) in 10m 15s, read: 146.7 MiB/s, write: 146.7 MiB/s
INFO:  99% (90.2 GiB of 91.1 GiB) in 10m 21s, read: 149.3 MiB/s, write: 148.7 MiB/s
ERROR: VM 133 not running
INFO: aborting backup job
ERROR: VM 133 not running
INFO: resuming VM again
INFO: removing (old) fleecing image 'local:133/vm-133-fleece-0.qcow2'
INFO: removing (old) fleecing image 'local:133/vm-133-fleece-1.qcow2'
ERROR: Backup of VM 133 failed - VM 133 not running
INFO: Failed at 2026-04-11 23:25:41
INFO: Backup job finished with errors
INFO: notified via target `<*****@**********>`
TASK ERROR: job errors
 
Last edited:
Hi,

I would start with the memory. The two fleece images together preallocate metadata for ~3TB — this can push memory usage over the edge after days of uptime with gradual memory pressure. Please check if it's not out of memory type of issue:

Code:
dmesg | grep -i "oom\|killed process\|out of memory"
grep -i "oom\|killed" /var/log/syslog
 
--bwlimit 153600
I would further slow down. My goal would be to adjust to the speed the PBS is actually capable to absorb - to keep the fleecing file small. If that runs stable I would (relatively) slowly increase the bwlimit.

Just my thought... and a very slow and time consuming process.
 
Hi,

I would start with the memory. The two fleece images together preallocate metadata for ~3TB — this can push memory usage over the edge after days of uptime with gradual memory pressure. Please check if it's not out of memory type of issue:

Code:
dmesg | grep -i "oom\|killed process\|out of memory"
grep -i "oom\|killed" /var/log/syslog
There is nothing in dmesg VM just suddenly stops without errors inside VM and there is no kernel panic or anything on VNC because VM i stopped not in error state. I have similar VM (the same size, system, configuration...) on PVE8 and they backup without problems. The only difference is that here i have PVE9 and zfs on host instead of ext4. Machines (hosts) are also the same (ram, cpu, disks). I think that it may be rather problem of proxmox / host then VM itself.
 
I found something else in journalctl (on proxmox host) when the VM was stopped:

Bash:
Apr 11 23:25:40 p** QEMU[4172743]: kvm: ../block/io.c:444: bdrv_drain_assert_idle: Assertion `qatomic_read(&bs->in_flight) == 0' failed.

I may be important, that i had another problem with PVE9 as mentioned in this thread: https://forum.proxmox.com/threads/proxmox-9-io-error-zfs.179519 and fixed it with "zfs set direct=disabled" but there still may be something wrong that makes backups fail after some time. I am considering going back to PVE8 on this host if this problem will remain.
 
Last edited: