ERROR: VM 103 qmp command 'query-backup' failed - got timeout ?

fpausp

Renowned Member
Aug 31, 2010
659
49
93
Austria near Vienna
Hi, I run the latest version ov Proxmox VE and Proxmox Backup Server...

When I try to backup my VMs I always get an error on just one machine (vm103):


Code:
INFO:  75% (1.1 TiB of 1.5 TiB) in 34m 43s, read: 550.7 MiB/s, write: 6.3 MiB/s
INFO:  76% (1.1 TiB of 1.5 TiB) in 37m 23s, read: 95.5 MiB/s, write: 921.6 KiB/s
INFO:  77% (1.1 TiB of 1.5 TiB) in 39m 23s, read: 127.5 MiB/s, write: 136.5 KiB/s
INFO:  78% (1.1 TiB of 1.5 TiB) in 41m 33s, read: 118.7 MiB/s, write: 94.5 KiB/s
INFO:  79% (1.2 TiB of 1.5 TiB) in 43m 23s, read: 138.8 MiB/s, write: 335.1 KiB/s
INFO:  80% (1.2 TiB of 1.5 TiB) in 44m 23s, read: 266.5 MiB/s, write: 68.3 KiB/s
INFO:  81% (1.2 TiB of 1.5 TiB) in 44m 47s, read: 634.8 MiB/s, write: 0 B/s
INFO:  82% (1.2 TiB of 1.5 TiB) in 44m 52s, read: 3.6 GiB/s, write: 0 B/s
INFO:  83% (1.2 TiB of 1.5 TiB) in 45m  3s, read: 1.5 GiB/s, write: 0 B/s
INFO:  84% (1.2 TiB of 1.5 TiB) in 45m  6s, read: 4.8 GiB/s, write: 0 B/s
INFO:  85% (1.2 TiB of 1.5 TiB) in 45m  9s, read: 4.9 GiB/s, write: 0 B/s
INFO:  86% (1.3 TiB of 1.5 TiB) in 45m 12s, read: 5.0 GiB/s, write: 0 B/s
INFO:  87% (1.3 TiB of 1.5 TiB) in 45m 15s, read: 5.0 GiB/s, write: 0 B/s
INFO:  88% (1.3 TiB of 1.5 TiB) in 45m 18s, read: 4.8 GiB/s, write: 0 B/s
INFO:  89% (1.3 TiB of 1.5 TiB) in 45m 21s, read: 4.3 GiB/s, write: 0 B/s
INFO:  90% (1.3 TiB of 1.5 TiB) in 45m 24s, read: 4.9 GiB/s, write: 0 B/s
INFO:  91% (1.3 TiB of 1.5 TiB) in 45m 27s, read: 4.9 GiB/s, write: 0 B/s
INFO:  92% (1.3 TiB of 1.5 TiB) in 45m 30s, read: 5.0 GiB/s, write: 0 B/s
INFO:  93% (1.4 TiB of 1.5 TiB) in 45m 37s, read: 2.1 GiB/s, write: 0 B/s
INFO:  94% (1.4 TiB of 1.5 TiB) in 45m 40s, read: 5.0 GiB/s, write: 0 B/s
INFO:  95% (1.4 TiB of 1.5 TiB) in 45m 43s, read: 4.8 GiB/s, write: 0 B/s
INFO:  96% (1.4 TiB of 1.5 TiB) in 45m 46s, read: 5.1 GiB/s, write: 0 B/s
INFO:  97% (1.4 TiB of 1.5 TiB) in 45m 49s, read: 4.9 GiB/s, write: 0 B/s
INFO:  98% (1.4 TiB of 1.5 TiB) in 45m 52s, read: 5.0 GiB/s, write: 0 B/s
INFO:  99% (1.5 TiB of 1.5 TiB) in 45m 56s, read: 4.6 GiB/s, write: 0 B/s
ERROR: VM 103 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job

Do you know the error and how can I solve it?
 
Hi, Guys. I just want to confirm the issue still exists. All of our VM-s are Debain (ranging from 7-10) with qemu-agent (and only a couple without). Issue appears to be random and bricks VM (only reboot helps). After the reboot it works OK for example 7 days and the issue starts to reappear. Also on the forum I found that updating the qemu server from the test repo possibly solves the issue (we have done the update but the issue still appears). When i do the manual backup to PBS it works without a problem.

Code:
INFO: Starting Backup of VM 163 (qemu)
INFO: Backup started at 2020-11-05 03:33:39
INFO: status = running
INFO: VM Name: xxx
INFO: include disk 'scsi0' 'shared_lvm:vm-163-disk-0' 10G
INFO: include disk 'scsi1' 'shared_lvm:vm-163-disk-0' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/163/2020-11-05T02:33:39Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 163 qmp command 'guest-fsfreeze-thaw' failed - got timeout
ERROR: VM 163 qmp command 'backup' failed - got timeout
ERROR: Backup of VM 163 failed - VM 163 qmp command 'backup' failed - got timeout
INFO: Failed at 2020-11-05 03:34:

We have the newest stable version of proxmox-VE (community subscription, cluster of 5 nodes with 10G interface-s, seperate for cluster, VM data and FC for shared storage) and proxmox-backup.

Code:
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-12 (running version: 6.2-12/b287dd27)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 0.9.4-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-6
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-3
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-15
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2




Backup Server 0.9-6

We've been using PBS for a while.

Thanks.

BR.

Michael

P.S.

probably also related to https://forum.proxmox.com/threads/qmp-command-backup-failed-got-timeout.77749/
 
Last edited:
Hi, I run the latest version ov Proxmox VE and Proxmox Backup Server...

When I try to backup my VMs I always get an error on just one machine (vm103):


Code:
INFO:  75% (1.1 TiB of 1.5 TiB) in 34m 43s, read: 550.7 MiB/s, write: 6.3 MiB/s
INFO:  76% (1.1 TiB of 1.5 TiB) in 37m 23s, read: 95.5 MiB/s, write: 921.6 KiB/s
INFO:  77% (1.1 TiB of 1.5 TiB) in 39m 23s, read: 127.5 MiB/s, write: 136.5 KiB/s
INFO:  78% (1.1 TiB of 1.5 TiB) in 41m 33s, read: 118.7 MiB/s, write: 94.5 KiB/s
INFO:  79% (1.2 TiB of 1.5 TiB) in 43m 23s, read: 138.8 MiB/s, write: 335.1 KiB/s
INFO:  80% (1.2 TiB of 1.5 TiB) in 44m 23s, read: 266.5 MiB/s, write: 68.3 KiB/s
INFO:  81% (1.2 TiB of 1.5 TiB) in 44m 47s, read: 634.8 MiB/s, write: 0 B/s
INFO:  82% (1.2 TiB of 1.5 TiB) in 44m 52s, read: 3.6 GiB/s, write: 0 B/s
INFO:  83% (1.2 TiB of 1.5 TiB) in 45m  3s, read: 1.5 GiB/s, write: 0 B/s
INFO:  84% (1.2 TiB of 1.5 TiB) in 45m  6s, read: 4.8 GiB/s, write: 0 B/s
INFO:  85% (1.2 TiB of 1.5 TiB) in 45m  9s, read: 4.9 GiB/s, write: 0 B/s
INFO:  86% (1.3 TiB of 1.5 TiB) in 45m 12s, read: 5.0 GiB/s, write: 0 B/s
INFO:  87% (1.3 TiB of 1.5 TiB) in 45m 15s, read: 5.0 GiB/s, write: 0 B/s
INFO:  88% (1.3 TiB of 1.5 TiB) in 45m 18s, read: 4.8 GiB/s, write: 0 B/s
INFO:  89% (1.3 TiB of 1.5 TiB) in 45m 21s, read: 4.3 GiB/s, write: 0 B/s
INFO:  90% (1.3 TiB of 1.5 TiB) in 45m 24s, read: 4.9 GiB/s, write: 0 B/s
INFO:  91% (1.3 TiB of 1.5 TiB) in 45m 27s, read: 4.9 GiB/s, write: 0 B/s
INFO:  92% (1.3 TiB of 1.5 TiB) in 45m 30s, read: 5.0 GiB/s, write: 0 B/s
INFO:  93% (1.4 TiB of 1.5 TiB) in 45m 37s, read: 2.1 GiB/s, write: 0 B/s
INFO:  94% (1.4 TiB of 1.5 TiB) in 45m 40s, read: 5.0 GiB/s, write: 0 B/s
INFO:  95% (1.4 TiB of 1.5 TiB) in 45m 43s, read: 4.8 GiB/s, write: 0 B/s
INFO:  96% (1.4 TiB of 1.5 TiB) in 45m 46s, read: 5.1 GiB/s, write: 0 B/s
INFO:  97% (1.4 TiB of 1.5 TiB) in 45m 49s, read: 4.9 GiB/s, write: 0 B/s
INFO:  98% (1.4 TiB of 1.5 TiB) in 45m 52s, read: 5.0 GiB/s, write: 0 B/s
INFO:  99% (1.5 TiB of 1.5 TiB) in 45m 56s, read: 4.6 GiB/s, write: 0 B/s
ERROR: VM 103 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job

Do you know the error and how can I solve it?
This topic is old and I'll share what I found as a solution.

First I entered my VM that was not running the backup and updated the repositories with (#apt update).
After updating the repositories I executed the command to install the Qemu Guest Agent again ( #apt install qemu-guest-agent).

And then the backup worked normally.
 
I have a very similar problem when backing up a Windows server. However, the difference is that the backup is not even started.
I will try reinstalling the qemu guest agent and give you feedback.

Code:
INFO: Starting Backup of VM 888 (qemu)
INFO: Backup started at 2023-05-25 xx:xx:xx
INFO: status = running
INFO: VM Name: customer-server01
INFO: include disk 'scsi0' 'customer-storage01:vm-888-disk-1' 100G
INFO: include disk 'scsi1' 'customer-storage01:vm-888-disk-3' 800G
INFO: include disk 'efidisk0' 'customer-storage01:vm-888-disk-0' 528K
INFO: include disk 'tpmstate0' 'customer-storage01:vm-888-disk-4' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: skip unused drive 'customer-storage01:vm-888-disk-2' (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/888/2023-05-25Txx:xx:xxZ'
INFO: attaching TPM drive to QEMU for backup
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 888 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 888 failed - VM 888 qmp command 'backup' failed - got timeout
INFO: Failed at 2023-05-25 xx:xx:xx
 
After some time without the problem, the errors occurred again. The backups of several machines ended with a timeout. In addition, I could not access a list of the backups on the Proxmox backup server in the VM's backup overview; there was also a timeout here.

When I looked at the backup server, I could see a large number of running tasks and a high IO load on the hard disks.

The tasks were verifications of large machines and garbage collection. After a reboot, the listing is fast again and also the backup is no problem.
 
After some time without the problem, the errors occurred again. The backups of several machines ended with a timeout. In addition, I could not access a list of the backups on the Proxmox backup server in the VM's backup overview; there was also a timeout here.

When I looked at the backup server, I could see a large number of running tasks and a high IO load on the hard disks.

The tasks were verifications of large machines and garbage collection. After a reboot, the listing is fast again and also the backup is no problem.

I join to the "club", last proxmox pve (8.0.4), qemu-agent installed, windows 2019 server, backup started at midnight and stopped at 64%.

INFO: 63% (473.5 GiB of 750.0 GiB) in 2h 23m, read: 25.7 MiB/s, write: 22.4 MiB/s
INFO: 64% (480.1 GiB of 750.0 GiB) in 2h 26m 20s, read: 33.7 MiB/s, write: 21.7 MiB/s
ERROR: VM 100 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 100 qmp command 'backup-cancel' failed - unable to connect to VM 100 qmp socket - timeout after 5988 retries
INFO: resuming VM again
ERROR: Backup of VM 100 failed - VM 100 qmp command 'cont' failed - unable to connect to VM 100 qmp socket - timeout after 450 retries
INFO: Failed at 2023-11-20 02:51:15
INFO: Backup job finished with errors

TASK ERROR: job errors

After that no VNC connection, vncproxy timeout, unable to reach the VM, no shutdown, only a forced stop

It was a local backup

Honestly speaking I'm having hard times with backup in the last time..
 
If I'm not mistaken, there was a bug when backing up VMs with two or more disks. Check for updates and take a look at the changelog of pve-manager.
 
Hi,
I join to the "club", last proxmox pve (8.0.4), qemu-agent installed, windows 2019 server, backup started at midnight and stopped at 64%.

INFO: 63% (473.5 GiB of 750.0 GiB) in 2h 23m, read: 25.7 MiB/s, write: 22.4 MiB/s
INFO: 64% (480.1 GiB of 750.0 GiB) in 2h 26m 20s, read: 33.7 MiB/s, write: 21.7 MiB/s
ERROR: VM 100 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 100 qmp command 'backup-cancel' failed - unable to connect to VM 100 qmp socket - timeout after 5988 retries
INFO: resuming VM again
ERROR: Backup of VM 100 failed - VM 100 qmp command 'cont' failed - unable to connect to VM 100 qmp socket - timeout after 450 retries
INFO: Failed at 2023-11-20 02:51:15
INFO: Backup job finished with errors

TASK ERROR: job errors

After that no VNC connection, vncproxy timeout, unable to reach the VM, no shutdown, only a forced stop

It was a local backup

Honestly speaking I'm having hard times with backup in the last time..
in your case, it fails in the middle of the backup and telling from the error messages and similar issues in the past, it looks like QEMU process got completely stuck. Please post the output of pveversion -v and qm config 100.

Should the issue happen again, you can use apt install pve-qemu-kvm-dbgsym gdb to install the relevant debug symbols and debugger. And then obtain backtraces with:
Code:
gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/100.pid)
assuming that 100 is the ID of the VM.
 
Did anyone find a solution for this?
I'm backing up using PBS, all other VMs and CTs before and after this VM has no issues.
It only happens if the backup is a snapshot, if I backup using 'Stop' mode it works just fine.
It only seems to happen with this specific Windows11 VM.

Code:
()
INFO: starting new backup job: vzdump 108 --node homelab --storage puddle --remove 0 --notes-template '{{guestname}}' --notification-mode auto --mode snapshot
INFO: Starting Backup of VM 108 (qemu)
INFO: Backup started at 2025-02-16 07:35:57
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: windows11
INFO: exclude disk 'scsi0' 'test:0.0.1.scsi-36589cfc000000e743858e34c9cf6edc4' (backup=no)
INFO: include disk 'sata0' 'vms-pool:vm-108-disk-3' 306G
INFO: exclude disk 'sata1' 'nas1-games:0.0.1.scsi-36589cfc000000e743858e34c9cf6edc4' (backup=no)
INFO: include disk 'efidisk0' 'vms-pool:vm-108-disk-2' 1M
INFO: include disk 'tpmstate0' 'vms-pool:vm-108-disk-4' 4M
INFO: creating Proxmox Backup Server archive 'vm/108/2025-02-16T07:35:57Z'
INFO: starting kvm to execute backup task
swtpm_setup: Not overwriting existing state file.
ERROR: Backup of VM 108 failed - start failed: command '/usr/bin/kvm -id 108 -name 'windows11,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/108.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/108.pid -daemonize -smbios 'type=1,product=PowerEdge T430,uuid=f5a21620-e40e-46d9-8d20-a4b42020bea7,serial=5ZW2NK2,sku=ITM2CB0067,manufacturer=Dell Inc.' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/zvol/vms-pool/vm-108-disk-2,size=540672' -smp '10,sockets=1,cores=10,maxcpus=10' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=8c48d898-e3ea-4e2a-b8b0-12737ecc0e5e' -device 'qemu-xhci,p2=15,p3=15,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:b3:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:b3:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'usb-host,bus=xhci.0,port=1,hostbus=1,hostport=1,id=usb0' -device 'usb-host,bus=xhci.0,port=2,hostbus=1,hostport=5,id=usb1' -device 'usb-host,bus=xhci.0,port=3,hostbus=1,hostport=5,id=usb2' -device 'usb-host,bus=xhci.0,port=4,hostbus=1,hostport=14,id=usb3' -chardev 'socket,id=tpmchar,path=/var/run/qemu-server/108.swtpm' -tpmdev 'emulator,id=tpmdev,chardev=tpmchar' -device 'tpm-tis,tpmdev=tpmdev' -chardev 'socket,path=/var/run/qemu-server/108.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:415f4588b9e8' -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/disk/by-id/scsi-36589cfc000000e743858e34c9cf6edc4,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,rotation_rate=1' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/dev/zvol/vms-pool/vm-108-disk-3,if=none,id=drive-sata0,cache=writeback,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap' -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,rotation_rate=1,bootindex=100' -drive 'file=/dev/disk/by-id/scsi-36589cfc000000e743858e34c9cf6edc4,if=none,id=drive-sata1,cache=writeback,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap' -device 'ide-hd,bus=ahci0.1,drive=drive-sata1,id=sata1,rotation_rate=1' -netdev 'type=tap,id=net0,ifname=tap108i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=BC:24:11:B1:74:2C,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256' -rtc 'driftfix=slew,base=localtime' -machine 'hpet=off,type=pc-q35-9.0+pve0' -global 'kvm-pit.lost_tick_policy=discard' -cpu 'host,kvm=off,-hypervisor,' -smbios 'type=0' -S' failed: got timeout
INFO: Failed at 2025-02-16 07:40:12
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Hi,
Did anyone find a solution for this?
I'm backing up using PBS, all other VMs and CTs before and after this VM has no issues.
It only happens if the backup is a snapshot, if I backup using 'Stop' mode it works just fine.
It only seems to happen with this specific Windows11 VM.
Code:
INFO: status = stopped
INFO: backup mode: stop
It is using stop mode backup because the VM was stopped.
Code:
-device 'vfio-pci,host=0000:b3:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:b3:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1
You are using PCI passthrough which often means the full memory needs to be mapped for DMA up-front. Please post the output of pveversion -v. There was a patch to increase the start timeout in this scenario a while ago. How long does it approximately take to start the VM usually outside of backup (i.e. time from clicking the button until you see the BIOS screen)?
 
  • Like
Reactions: onlyzeros
Hi,


It is using stop mode backup because the VM was stopped.

You are using PCI passthrough which often means the full memory needs to be mapped for DMA up-front. Please post the output of pveversion -v. There was a patch to increase the start timeout in this scenario a while ago. How long does it approximately take to start the VM usually outside of backup (i.e. time from clicking the button until you see the BIOS screen)?
Hi Fiona,
I just noticed that starting the VM it also fails to start.

Code:
Feb 17 10:30:52 homelab pvestatd[3263]: status update time (8.493 seconds)
Feb 17 10:31:01 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 10:31:01 homelab pvestatd[3263]: status update time (8.419 seconds)
Feb 17 10:31:11 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 10:31:11 homelab pvedaemon[100590]: VM 108 qmp command failed - VM 108 qmp command 'guest-ping' failed - unable to connect to VM 108 qga socket - timeout after 31 retries
Feb 17 10:31:12 homelab pvestatd[3263]: status update time (8.506 seconds)
Feb 17 10:31:12 homelab pvedaemon[2880511]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 10:31:21 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries

Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-7
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20241112.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
 
How long did you wait after starting? The log doesn't show the timeout for the start command yet. You could try using a bit less RAM and see if it can start then. How does the current pressure on the system look like grep '' /proc/pressure/*?
 
I waited a couple of hours. It seems I can't start it at all.
I have a 256gb total with a VM disks in a zfs pool raidz1
I tried starting the VM with 16gb and this worked but still shows errors in logs. It takes 1.30min to start.

journalctl -n 108
Code:
Feb 17 11:00:26 homelab pvedaemon[393249]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - got timeout
Feb 17 11:00:31 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:32 homelab pvestatd[3263]: status update time (8.493 seconds)
Feb 17 11:00:41 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:41 homelab pvestatd[3263]: status update time (8.435 seconds)
Feb 17 11:00:51 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:51 homelab pvedaemon[419542]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:52 homelab pvestatd[3263]: status update time (8.478 seconds)
Feb 17 11:01:01 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:01 homelab pvestatd[3263]: status update time (8.447 seconds)
Feb 17 11:01:11 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:12 homelab pvestatd[3263]: status update time (8.491 seconds)
Feb 17 11:01:17 homelab pvedaemon[100590]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:21 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:21 homelab pvestatd[3263]: status update time (8.434 seconds)
Feb 17 11:01:31 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:32 homelab pvestatd[3263]: status update time (8.488 seconds)
Feb 17 11:01:41 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:41 homelab pvestatd[3263]: status update time (8.465 seconds)
Feb 17 11:01:43 homelab pvedaemon[419542]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries

grep '' /proc/pressure/*?
Code:
/proc/pressure/cpu:some avg10=0.15 avg60=0.07 avg300=0.06 total=2705274820
/proc/pressure/cpu:full avg10=0.00 avg60=0.00 avg300=0.00 total=0
/proc/pressure/io:some avg10=0.00 avg60=0.00 avg300=0.00 total=268886489
/proc/pressure/io:full avg10=0.00 avg60=0.00 avg300=0.00 total=239237188
/proc/pressure/memory:some avg10=0.00 avg60=0.00 avg300=0.31 total=1353288317
/proc/pressure/memory:full avg10=0.00 avg60=0.00 avg300=0.31 total=1353089846
 
I waited a couple of hours. It seems I can't start it at all.
I have a 256gb total with a VM disks in a zfs pool raidz1
I tried starting the VM with 16gb and this worked but still shows errors in logs. It takes 1.30min to start.
So it most likely is related to reserving the memory. How much memory is used for ZFS ARC and how much for other VMs?
journalctl -n 108
Code:
Feb 17 11:00:26 homelab pvedaemon[393249]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - got timeout
Feb 17 11:00:31 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:32 homelab pvestatd[3263]: status update time (8.493 seconds)
Feb 17 11:00:41 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:41 homelab pvestatd[3263]: status update time (8.435 seconds)
Feb 17 11:00:51 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:51 homelab pvedaemon[419542]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:00:52 homelab pvestatd[3263]: status update time (8.478 seconds)
Feb 17 11:01:01 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:01 homelab pvestatd[3263]: status update time (8.447 seconds)
Feb 17 11:01:11 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:12 homelab pvestatd[3263]: status update time (8.491 seconds)
Feb 17 11:01:17 homelab pvedaemon[100590]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:21 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:21 homelab pvestatd[3263]: status update time (8.434 seconds)
Feb 17 11:01:31 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:32 homelab pvestatd[3263]: status update time (8.488 seconds)
Feb 17 11:01:41 homelab pvestatd[3263]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Feb 17 11:01:41 homelab pvestatd[3263]: status update time (8.465 seconds)
Feb 17 11:01:43 homelab pvedaemon[419542]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Those messages should be harmless in this context. When the QEMU process is seen as running, the status daemon already tries to query it. In your case it's just not ready to respond yet.
grep '' /proc/pressure/*?
Code:
/proc/pressure/cpu:some avg10=0.15 avg60=0.07 avg300=0.06 total=2705274820
/proc/pressure/cpu:full avg10=0.00 avg60=0.00 avg300=0.00 total=0
/proc/pressure/io:some avg10=0.00 avg60=0.00 avg300=0.00 total=268886489
/proc/pressure/io:full avg10=0.00 avg60=0.00 avg300=0.00 total=239237188
/proc/pressure/memory:some avg10=0.00 avg60=0.00 avg300=0.31 total=1353288317
/proc/pressure/memory:full avg10=0.00 avg60=0.00 avg300=0.31 total=1353089846
Okay, so no memory pressure.
 
So it most likely is related to reserving the memory. How much memory is used for ZFS ARC and how much for other VMs?

Those messages should be harmless in this context. When the QEMU process is seen as running, the status daemon already tries to query it. In your case it's just not ready to respond yet.

Okay, so no memory pressure.
I have not set ZFS ARC the /etc/modprobe.d/zfs.conf doesn't exist.
Currently it's using 85GB
127GB cached
34GB free

Should I set a limit in zfs.conf?
 
I'd say that depends on how much memory is available otherwise. How much do other VMs/containers/services on the node use? If your VM needs to reserve all its memory up front (because of hotplug) it might benefit on more memory being available in total.
 
Hi Fiona,
Just an update to this.
It seems that it was a memory allocation issue causing the backups and the vm to fail.
After setting zfs_arc_max /etc/modprobe.d/zfs.conf I can see less memory allocated to zfs_arc and the vm's and backups run without any issues.

Thank you for your help.

One last question about zfs arc. I know it allocates free memory for zfs but isn't supposed to free up when the host or vm's needs it?
In truenas scale I don't have this problem.
 
Last edited:
Hi Fiona,
Just an update to this.
It seems that it was a memory allocation issue causing the backups and the vm to fail.
After setting zfs_arc_max /etc/modprobe.d/zfs.conf I can see less memory allocated to zfs_arc and the vm's and backups run without any issues.
Glad to hear :)
One last question about zfs arc. I know it allocates free memory for zfs but isn't supposed to free up when the host or vm's needs it?
In truenas scale I don't have this problem.
Not an expert in this area, but I'd guess it depends on the concrete situation. Sometimes it might not be possible to free up, or at least not fast enough.