Possible bug after upgrading to 7.2: VM freeze if backing up large disks

mxscbv

Member
Jan 25, 2022
37
0
6
37
I'm experiencing issues when backing up VMs with large disks (e.g. 10TB, 20TB) to a remote PMG.

When the backup starts, first the backup process itself hangs at 0% like this:

INFO: starting new backup job: vzdump 18723 --storage pbs --notes-template '{{guestname}}' --node p01 --remove 0 --mode snapshot
INFO: Starting Backup of VM 18723 (qemu)
INFO: Backup started at 2022-05-09 20:00:16
INFO: status = running
INFO: include disk 'scsi0' 'cpool:vm-18723-disk-1' 300G
INFO: include disk 'scsi1' 'cpool:vm-18723-disk-2' 10300G
INFO: include disk 'scsi2' 'hdpool:vm-18723-disk-0' 15500G
INFO: include disk 'efidisk0' 'cpool:vm-18723-disk-0' 1M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: pending configuration changes found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/18723/2022-05-09T20:00:16Z'
INFO: enabling encryption
INFO: started backup task '675143a4-05b1-4eef-9b60-f5eadb7a7523'
INFO: resuming VM again
INFO: efidisk0: dirty-bitmap status: created new
INFO: scsi0: dirty-bitmap status: created new
INFO: scsi1: dirty-bitmap status: created new
INFO: scsi2: dirty-bitmap status: created new
INFO: 0% (480.0 MiB of 25.5 TiB) in 3s, read: 160.0 MiB/s, write: 17.3 MiB/s

At the same time (or a bit later) the VM becomes non-responsive, I can't connect to the VMs Console (via Proxmox GUI), and eventually, it hangs completely unless I cancel the backup, unlock the VM via 'qm unlock ID' and stop/start the VM.

If a VM is turned on during backup, I sometimes see kernel panics/errors like:

kernel:watchdog: BUG: soft lockup - CPU#58 stuck for 22s! [kworker/58:2:822]

Everything works fine with smaller disks like 500-1000GB.

Can anyone advise how to sort this out?

Thanks a lot!
 
Which PVE-Version? What Guest OS?
 
PVE latest, just updated to 7.2.
Guest OS is AlmaLinux, but I think it doesn't matter as the same 'hang' happens even if the VM is off.
 
- "remote" PMG is where exactly ? Not in your LAN I understand ?
- have you tried to backup without encryption (as a test) ?
 
No, its a remote destination. I also tried backing up to a local storage, same.

journatctl shows:

May 10 10:29:29 p01 pvedaemon[1463275]: VM 18723 qmp command failed - VM 18723 qmp command 'query-pbs-bitmap-info' failed - got timeout
May 10 10:29:32 p01 pvedaemon[3914617]: VM 18723 qmp command failed - VM 18723 qmp command 'query-proxmox-support' failed - got timeout
May 10 10:29:36 p01 pvestatd[26067]: VM 18723 qmp command failed - VM 18723 qmp command 'query-proxmox-support' failed - unable to con>
 
I seem to have the exact same problem since upgrading to 7.2. On an upgraded node, the backup is very slow and the VM is unresponsive and even the web GUI takes its time to load the VMs memory and cpu utilization.

Backup log on 7.2:
INFO: starting new backup job: vzdump 121 --remove 0 --mode snapshot --node PROX-B1 --storage PBS-HW
INFO: Starting Backup of VM 121 (qemu)
INFO: Backup started at 2022-05-10 16:50:07
INFO: status = running
INFO: VM Name: VGFILE2
INFO: include disk 'virtio0' 'VG-Pool:vm-121-disk-0' 128G
INFO: include disk 'virtio1' 'VG-Pool:vm-121-disk-1' 5T
INFO: include disk 'virtio2' 'VG-Pool:vm-121-disk-2' 1T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/121/2022-05-10T14:50:07Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '8807679a-27e5-4440-861b-f83da25d3fbc'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: created new
INFO: virtio1: dirty-bitmap status: created new
INFO: virtio2: dirty-bitmap status: created new
INFO: 0% (3.1 GiB of 6.1 TiB) in 4s, read: 800.0 MiB/s, write: 29.0 MiB/s
INFO: 1% (63.3 GiB of 6.1 TiB) in 2m 37s, read: 402.7 MiB/s, write: 1.7 MiB/s
INFO: 2% (125.5 GiB of 6.1 TiB) in 5m 9s, read: 419.0 MiB/s, write: 1.8 MiB/s
INFO: 3% (188.8 GiB of 6.1 TiB) in 7m 51s, read: 400.2 MiB/s, write: 1011.4 KiB/s
INFO: 4% (251.7 GiB of 6.1 TiB) in 10m 7s, read: 473.6 MiB/s, write: 481.9 KiB/s
INFO: 5% (314.6 GiB of 6.1 TiB) in 12m 31s, read: 447.3 MiB/s, write: 455.1 KiB/s
INFO: 6% (376.6 GiB of 6.1 TiB) in 14m 49s, read: 459.6 MiB/s, write: 237.4 KiB/s
INFO: 7% (439.5 GiB of 6.1 TiB) in 17m 9s, read: 460.7 MiB/s, write: 614.4 KiB/s
INFO: 8% (502.4 GiB of 6.1 TiB) in 19m 35s, read: 440.7 MiB/s, write: 813.6 KiB/s
INFO: 9% (564.7 GiB of 6.1 TiB) in 21m 51s, read: 469.0 MiB/s, write: 572.2 KiB/s
INFO: 10% (628.2 GiB of 6.1 TiB) in 24m 8s, read: 474.5 MiB/s, write: 388.7 KiB/s
INFO: 11% (691.5 GiB of 6.1 TiB) in 26m 25s, read: 473.3 MiB/s, write: 508.3 KiB/s
INFO: 12% (753.2 GiB of 6.1 TiB) in 28m 39s, read: 471.7 MiB/s, write: 122.3 KiB/s
INFO: 13% (815.8 GiB of 6.1 TiB) in 30m 55s, read: 471.1 MiB/s, write: 1.4 MiB/s
INFO: 14% (878.7 GiB of 6.1 TiB) in 33m 12s, read: 470.0 MiB/s, write: 59.8 KiB/s
INFO: 15% (940.9 GiB of 6.1 TiB) in 35m 27s, read: 472.3 MiB/s, write: 60.7 KiB/s
INFO: 16% (1003.6 GiB of 6.1 TiB) in 37m 45s, read: 465.0 MiB/s, write: 29.7 KiB/s

While on an not yet upgraded node in the same cluster of the same VM:
INFO: starting new backup job: vzdump 121 --mode snapshot --remove 0 --storage PBS-HW --node PROX-B2
INFO: Starting Backup of VM 121 (qemu)
INFO: Backup started at 2022-05-10 15:55:16
INFO: status = running
INFO: VM Name: VGFILE2
INFO: include disk 'virtio0' 'VG-Pool:vm-121-disk-0' 128G
INFO: include disk 'virtio1' 'VG-Pool:vm-121-disk-1' 5T
INFO: include disk 'virtio2' 'VG-Pool:vm-121-disk-2' 1T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/121/2022-05-10T13:55:16Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'c31b0272-e124-4f7c-90e7-1f36a95b2d39'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: created new
INFO: virtio1: dirty-bitmap status: created new
INFO: virtio2: dirty-bitmap status: created new
INFO: 0% (7.4 GiB of 6.1 TiB) in 3s, read: 2.5 GiB/s, write: 177.3 MiB/s
INFO: 1% (64.5 GiB of 6.1 TiB) in 18s, read: 3.8 GiB/s, write: 9.6 MiB/s
INFO: 2% (134.2 GiB of 6.1 TiB) in 33s, read: 4.6 GiB/s, write: 8.5 MiB/s
INFO: 3% (197.1 GiB of 6.1 TiB) in 57s, read: 2.6 GiB/s, write: 48.3 MiB/s
INFO: 4% (257.5 GiB of 6.1 TiB) in 1m 3s, read: 10.1 GiB/s, write: 0 B/s
INFO: 5% (318.2 GiB of 6.1 TiB) in 1m 10s, read: 8.7 GiB/s, write: 0 B/s
INFO: 6% (379.1 GiB of 6.1 TiB) in 1m 16s, read: 10.1 GiB/s, write: 0 B/s
INFO: 7% (449.1 GiB of 6.1 TiB) in 1m 23s, read: 10.0 GiB/s, write: 0 B/s
INFO: 8% (507.6 GiB of 6.1 TiB) in 1m 29s, read: 9.8 GiB/s, write: 0 B/s
INFO: 9% (567.0 GiB of 6.1 TiB) in 1m 36s, read: 8.5 GiB/s, write: 0 B/s
INFO: 10% (627.5 GiB of 6.1 TiB) in 1m 42s, read: 10.1 GiB/s, write: 0 B/s
INFO: 11% (696.9 GiB of 6.1 TiB) in 1m 49s, read: 9.9 GiB/s, write: 0 B/s
INFO: 12% (756.0 GiB of 6.1 TiB) in 1m 56s, read: 8.4 GiB/s, write: 0 B/s
INFO: 13% (824.5 GiB of 6.1 TiB) in 2m 3s, read: 9.8 GiB/s, write: 0 B/s
INFO: 14% (883.5 GiB of 6.1 TiB) in 2m 9s, read: 9.8 GiB/s, write: 0 B/s
INFO: 15% (943.0 GiB of 6.1 TiB) in 2m 16s, read: 8.5 GiB/s, write: 0 B/s
INFO: 16% (1004.0 GiB of 6.1 TiB) in 2m 22s, read: 10.2 GiB/s, write: 0 B/s
INFO: 17% (1.0 TiB of 6.1 TiB) in 2m 52s, read: 2.1 GiB/s, write: 27.3 MiB/s
INFO: 18% (1.1 TiB of 6.1 TiB) in 3m 35s, read: 1.5 GiB/s, write: 0 B/s
INFO: 19% (1.2 TiB of 6.1 TiB) in 4m 19s, read: 1.4 GiB/s, write: 50.2 MiB/s
INFO: 20% (1.2 TiB of 6.1 TiB) in 5m 1s, read: 1.5 GiB/s, write: 7.1 MiB/s
INFO: 21% (1.3 TiB of 6.1 TiB) in 5m 57s, read: 1.1 GiB/s, write: 69.8 MiB/s
INFO: 22% (1.3 TiB of 6.1 TiB) in 7m 5s, read: 954.5 MiB/s, write: 61.5 MiB/s
INFO: 23% (1.4 TiB of 6.1 TiB) in 7m 51s, read: 1.4 GiB/s, write: 0 B/s
INFO: 24% (1.5 TiB of 6.1 TiB) in 8m 37s, read: 1.4 GiB/s, write: 178.1 KiB/s
INFO: 25% (1.5 TiB of 6.1 TiB) in 9m 23s, read: 1.4 GiB/s, write: 0 B/s
INFO: 26% (1.6 TiB of 6.1 TiB) in 10m 9s, read: 1.4 GiB/s, write: 0 B/s
INFO: 27% (1.7 TiB of 6.1 TiB) in 10m 56s, read: 1.4 GiB/s, write: 261.4 KiB/s
INFO: 28% (1.7 TiB of 6.1 TiB) in 11m 41s, read: 1.4 GiB/s, write: 182.0 KiB/s
INFO: 29% (1.8 TiB of 6.1 TiB) in 12m 29s, read: 1.3 GiB/s, write: 0 B/s
INFO: 30% (1.8 TiB of 6.1 TiB) in 13m 15s, read: 1.3 GiB/s, write: 0 B/s
INFO: 31% (1.9 TiB of 6.1 TiB) in 14m 1s, read: 1.4 GiB/s, write: 0 B/s
INFO: 32% (2.0 TiB of 6.1 TiB) in 14m 47s, read: 1.4 GiB/s, write: 89.0 KiB/s
INFO: 33% (2.0 TiB of 6.1 TiB) in 15m 34s, read: 1.3 GiB/s, write: 0 B/s
INFO: 34% (2.1 TiB of 6.1 TiB) in 16m 20s, read: 1.3 GiB/s, write: 0 B/s
INFO: 35% (2.1 TiB of 6.1 TiB) in 17m 7s, read: 1.3 GiB/s, write: 0 B/s
INFO: 36% (2.2 TiB of 6.1 TiB) in 17m 54s, read: 1.3 GiB/s, write: 174.3 KiB/s
INFO: 37% (2.3 TiB of 6.1 TiB) in 18m 39s, read: 1.4 GiB/s, write: 0 B/s
INFO: 38% (2.3 TiB of 6.1 TiB) in 19m 25s, read: 1.4 GiB/s, write: 89.0 KiB/s
INFO: 39% (2.4 TiB of 6.1 TiB) in 20m 10s, read: 1.4 GiB/s, write: 0 B/s
INFO: 40% (2.5 TiB of 6.1 TiB) in 20m 55s, read: 1.4 GiB/s, write: 1.9 MiB/s
INFO: 41% (2.5 TiB of 6.1 TiB) in 21m 40s, read: 1.4 GiB/s, write: 182.0 KiB/s
INFO: 42% (2.6 TiB of 6.1 TiB) in 22m 24s, read: 1.4 GiB/s, write: 0 B/s
INFO: 43% (2.6 TiB of 6.1 TiB) in 23m 9s, read: 1.4 GiB/s, write: 0 B/s
INFO: 44% (2.7 TiB of 6.1 TiB) in 23m 53s, read: 1.4 GiB/s, write: 0 B/s
INFO: 45% (2.8 TiB of 6.1 TiB) in 24m 37s, read: 1.4 GiB/s, write: 0 B/s
INFO: 46% (2.8 TiB of 6.1 TiB) in 25m 22s, read: 1.4 GiB/s, write: 0 B/s
INFO: 47% (2.9 TiB of 6.1 TiB) in 26m 7s, read: 1.4 GiB/s, write: 0 B/s
INFO: 48% (2.9 TiB of 6.1 TiB) in 26m 52s, read: 1.4 GiB/s, write: 0 B/s
INFO: 49% (3.0 TiB of 6.1 TiB) in 27m 51s, read: 1.1 GiB/s, write: 95.0 MiB/s
INFO: 50% (3.1 TiB of 6.1 TiB) in 27m 58s, read: 9.9 GiB/s, write: 0 B/s
INFO: 51% (3.1 TiB of 6.1 TiB) in 28m 5s, read: 8.5 GiB/s, write: 0 B/s
INFO: 52% (3.2 TiB of 6.1 TiB) in 28m 11s, read: 10.1 GiB/s, write: 0 B/s
INFO: 53% (3.3 TiB of 6.1 TiB) in 28m 18s, read: 10.1 GiB/s, write: 0 B/s
INFO: 54% (3.3 TiB of 6.1 TiB) in 28m 25s, read: 8.5 GiB/s, write: 0 B/s
INFO: 55% (3.4 TiB of 6.1 TiB) in 28m 31s, read: 10.0 GiB/s, write: 0 B/s
INFO: 56% (3.4 TiB of 6.1 TiB) in 28m 37s, read: 9.9 GiB/s, write: 0 B/s
INFO: 57% (3.5 TiB of 6.1 TiB) in 28m 44s, read: 10.0 GiB/s, write: 0 B/s
INFO: 58% (3.6 TiB of 6.1 TiB) in 28m 51s, read: 8.6 GiB/s, write: 0 B/s
INFO: 59% (3.6 TiB of 6.1 TiB) in 28m 57s, read: 10.0 GiB/s, write: 0 B/s
INFO: 60% (3.7 TiB of 6.1 TiB) in 29m 4s, read: 9.9 GiB/s, write: 0 B/s
INFO: 61% (3.7 TiB of 6.1 TiB) in 29m 10s, read: 10.0 GiB/s, write: 0 B/s
INFO: 62% (3.8 TiB of 6.1 TiB) in 29m 16s, read: 9.9 GiB/s, write: 0 B/s
INFO: 63% (3.9 TiB of 6.1 TiB) in 29m 24s, read: 8.7 GiB/s, write: 0 B/s
INFO: 64% (3.9 TiB of 6.1 TiB) in 29m 30s, read: 9.9 GiB/s, write: 0 B/s
INFO: 65% (4.0 TiB of 6.1 TiB) in 29m 36s, read: 9.9 GiB/s, write: 0 B/s
INFO: 66% (4.1 TiB of 6.1 TiB) in 29m 44s, read: 8.7 GiB/s, write: 0 B/s
INFO: 67% (4.1 TiB of 6.1 TiB) in 29m 50s, read: 10.1 GiB/s, write: 0 B/s
INFO: 68% (4.2 TiB of 6.1 TiB) in 29m 56s, read: 9.9 GiB/s, write: 0 B/s
INFO: 69% (4.2 TiB of 6.1 TiB) in 30m 3s, read: 8.5 GiB/s, write: 0 B/s
INFO: 70% (4.3 TiB of 6.1 TiB) in 30m 10s, read: 9.9 GiB/s, write: 0 B/s
INFO: 71% (4.4 TiB of 6.1 TiB) in 30m 16s, read: 9.9 GiB/s, write: 0 B/s
INFO: 72% (4.4 TiB of 6.1 TiB) in 30m 23s, read: 8.4 GiB/s, write: 0 B/s
INFO: 73% (4.5 TiB of 6.1 TiB) in 30m 30s, read: 9.9 GiB/s, write: 0 B/s
INFO: 74% (4.5 TiB of 6.1 TiB) in 30m 36s, read: 9.8 GiB/s, write: 0 B/s
INFO: 75% (4.6 TiB of 6.1 TiB) in 30m 44s, read: 8.6 GiB/s, write: 0 B/s
INFO: 76% (4.7 TiB of 6.1 TiB) in 30m 50s, read: 9.8 GiB/s, write: 0 B/s
INFO: 77% (4.7 TiB of 6.1 TiB) in 30m 56s, read: 9.9 GiB/s, write: 0 B/s
INFO: 78% (4.8 TiB of 6.1 TiB) in 31m 4s, read: 8.7 GiB/s, write: 0 B/s
INFO: 79% (4.8 TiB of 6.1 TiB) in 31m 10s, read: 9.9 GiB/s, write: 0 B/s
INFO: 80% (4.9 TiB of 6.1 TiB) in 31m 16s, read: 10.1 GiB/s, write: 0 B/s
INFO: 81% (5.0 TiB of 6.1 TiB) in 31m 22s, read: 10.0 GiB/s, write: 0 B/s
INFO: 82% (5.0 TiB of 6.1 TiB) in 31m 30s, read: 8.6 GiB/s, write: 0 B/s
INFO: 83% (5.1 TiB of 6.1 TiB) in 31m 36s, read: 9.9 GiB/s, write: 0 B/s
INFO: 84% (5.1 TiB of 6.1 TiB) in 31m 42s, read: 9.8 GiB/s, write: 0 B/s
INFO: 85% (5.2 TiB of 6.1 TiB) in 31m 49s, read: 10.0 GiB/s, write: 0 B/s
INFO: 86% (5.3 TiB of 6.1 TiB) in 31m 55s, read: 10.1 GiB/s, write: 0 B/s
INFO: 87% (5.3 TiB of 6.1 TiB) in 32m 2s, read: 8.5 GiB/s, write: 0 B/s
INFO: 88% (5.4 TiB of 6.1 TiB) in 32m 9s, read: 10.0 GiB/s, write: 0 B/s
INFO: 89% (5.5 TiB of 6.1 TiB) in 32m 15s, read: 10.0 GiB/s, write: 0 B/s
INFO: 90% (5.5 TiB of 6.1 TiB) in 32m 21s, read: 10.0 GiB/s, write: 0 B/s
INFO: 91% (5.6 TiB of 6.1 TiB) in 32m 27s, read: 10.0 GiB/s, write: 0 B/s
INFO: 92% (5.6 TiB of 6.1 TiB) in 32m 35s, read: 8.6 GiB/s, write: 0 B/s
INFO: 93% (5.7 TiB of 6.1 TiB) in 32m 41s, read: 10.0 GiB/s, write: 0 B/s
INFO: 94% (5.8 TiB of 6.1 TiB) in 32m 47s, read: 10.1 GiB/s, write: 0 B/s
INFO: 95% (5.8 TiB of 6.1 TiB) in 32m 54s, read: 10.0 GiB/s, write: 0 B/s
INFO: 96% (5.9 TiB of 6.1 TiB) in 33m, read: 9.9 GiB/s, write: 0 B/s
INFO: 97% (5.9 TiB of 6.1 TiB) in 33m 7s, read: 8.3 GiB/s, write: 0 B/s
INFO: 98% (6.0 TiB of 6.1 TiB) in 33m 21s, read: 4.4 GiB/s, write: 85.1 MiB/s
INFO: 99% (6.1 TiB of 6.1 TiB) in 34m 55s, read: 727.4 MiB/s, write: 96.7 MiB/s
INFO: 100% (6.1 TiB of 6.1 TiB) in 35m 1s, read: 9.7 GiB/s, write: 0 B/s
INFO: backup is sparse: 4.17 TiB (68%) total zero data
INFO: backup was done incrementally, reused 6.10 TiB (99%)
INFO: transferred 6.12 TiB in 2102 seconds (3.0 GiB/s)
INFO: Finished Backup of VM 121 (00:35:08)
INFO: Backup finished at 2022-05-10 16:30:24
INFO: Backup job finished successfully
TASK OK

The log on the upgraded node is full of the following while the backup is running:
May 10 17:35:04 PROX-B1 pvestatd[3495]: status update time (6.219 seconds)
May 10 17:35:09 PROX-B1 pve-ha-lrm[88954]: VM 121 qmp command failed - VM 121 qmp command 'query-status' failed - got timeout
May 10 17:35:09 PROX-B1 pve-ha-lrm[88954]: VM 121 qmp command 'query-status' failed - got timeout
May 10 17:35:14 PROX-B1 pvestatd[3495]: VM 121 qmp command failed - VM 121 qmp command 'query-proxmox-support' failed - got timeout
May 10 17:35:15 PROX-B1 pvestatd[3495]: status update time (6.215 seconds)
May 10 17:35:19 PROX-B1 pve-ha-lrm[89224]: VM 121 qmp command failed - VM 121 qmp command 'query-status' failed - got timeout
May 10 17:35:19 PROX-B1 pve-ha-lrm[89224]: VM 121 qmp command 'query-status' failed - got timeout
May 10 17:35:24 PROX-B1 pvestatd[3495]: VM 121 qmp command failed - VM 121 qmp command 'query-proxmox-support' failed - got timeout
May 10 17:35:24 PROX-B1 pvestatd[3495]: status update time (6.215 seconds)

This is the only VM to produce such errors and it happens to be the only one with a 5TB disk. The OS in the VM is Windows Server 2019 and as stated, the behaviour started right at the first backup after the upgrade to 7.2.
Please let me know if i can provide further information.
 
True. It seems like it only affects VMs with large disks attached.

I was forced to disable all backups, its a major and critical issue that should be sorted out!

In my cluster, it only happens when a harddisk is large enough (e.g. 1TB or more).

Backing up locally, I get:

VM 18723 qmp command failed - VM 18723 qmp command 'query-proxmox-support' failed - got timeout

If I remove that large harddisk (and create a new one, smaller, e.g. 300G), everything works.

PVE and everything is updated to the latest (7.2), no idea what's going on.

Anyone?
 
Last edited:
Hi there,
I can confirm this issue since 7.2-3.
We are using Proxmox Backup Server for Backups, several VMs with larger Disks 1TB or more are freezing during the Backup.
Windows Guests get Disk I/O-Error after been in backup for about 30 to 45 Minutes and then just freeze. We have Windows Server 2022 with Exchange 2019 running, we will today move Databases to "smaller" Disks to confirm if issue is resolved then.

I will also file a bug report now. *EDIT* show_bug.cgi

By the way... the 1TB Disk has to be filled up with data around 80% or more to run into this issue.... empty or partial filled Disks do not raise this issue.
 
Last edited:
By the way... the 1TB Disk has to be filled up with data around 80% or more to run into this issue.... empty or partial filled Disks do not raise this issue.
I cannot confirm this. In my VM i have 3 disks, one 128GB filled with 19.6GB, one 1TB filled with 87.6GB and one 5TB filled with 61.1GB and it has the stated behaviour...
 
I cannot confirm this. In my VM i have 3 disks, one 128GB filled with 19.6GB, one 1TB filled with 87.6GB and one 5TB filled with 61.1GB and it has the stated behaviour...
Same. It happens even with empty just-created disks.
 
Will anyone from Proxmox visit this thread? This is a critical issue as we can't make any backups now.
 
Hi,
I am able to reproduce an issue (but I'm not entirely sure it's the same as reported here) when backing up a VM with a large disk. Namely, when there's large amounts of zeroes to be read, it seems like the QEMU main loop, and thus the guest execution, get starved. Since zeroes can be handled differently they don't need to be written to the backup and thus the read speed can get very large, apparently too large. But this issue seems to be present in pve-qemu-kvm=6.0.0-2 already.

Which version were you using before upgrading to PVE 7.2?

Will anyone from Proxmox visit this thread? This is a critical issue as we can't make any backups now.
Please open a support ticket if you require immediate assistance.
 
Could you try setting the controller type to Virtio SCSI single and turn on the iothread setting on the disks? Many thanks to @aaron for the suggestion. This only works for virtio and scsi disks IIRC.

7.1

I believe its a bug anyway, so 'assistance' won't help that much.
But we could take a closer look at the concrete situation and do less guess work.
 
Which version were you using before upgrading to PVE 7.2?
Package Versions of a node which does not show this behaviour:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.1-12 (running version: 7.1-12/b3c09de3)
pve-kernel-helper: 7.1-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-5
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.124-1-pve: 5.4.124-2
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-7
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-6
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
 
Could you try setting the controller type to Virtio SCSI single and turn on the iothread setting on the disks?
Same.

May 11 11:50:49 p01 pvedaemon[1798702]: VM 18723 qmp command failed - VM 18723 qmp command 'query-proxmox-support' failed - unable to connect to VM 18723 qmp socket - timeout after 31 retries
May 11 11:50:57 p01 pvestatd[490672]: VM 18723 qmp command failed - VM 18723 qmp command 'query-proxmox-support' failed - unable to connect to VM 18723 qmp socket - timeout after 31 retries
May 11 11:50:57 p01 pvestatd[490672]: status update time (6.404 seconds)
 
Could you check if downgrading to pve-qemu-kvm=6.1.1-2 (VM needs to be stopped/started to pick up the new version) works around the issue?
 
Could you check if downgrading to pve-qemu-kvm=6.1.1-2 (VM needs to be stopped/started to pick up the new version) works around the issue?
Upon doing the first couple of tests, it seems like there are no 'unable to connect to VM' errors anymore after the downgrade, when both backing up to CephFS and Proxmox Backup Server.
 
Upon doing the first couple of tests, it seems like there are no 'unable to connect to VM' errors anymore after the downgrade, when both backing up to CephFS and Proxmox Backup Server.
What about pve-qemu-kvm=6.2.0-1?

@GrueneNeun Since you have the issue with (mostly) empty disks. Does
Could you try setting the controller type to Virtio SCSI single and turn on the iothread setting on the disks? Many thanks to @aaron for the suggestion. This only works for virtio and scsi disks IIRC.
help for you? Do you also get qmp timeout errors in journal/syslog?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!