How can I debug "query-proxmox-support' failed - got timeout"? Guest startup hangs, and whole PC restarts.

marcosscriven

Member
Mar 6, 2021
136
11
23
I have Proxmox 6.3 and trying to start a Linux VM. I see the following logs - then the whole PC crashes!

I'm not sure how to get more verbose logs, or to see what's happening with 'query-proxmox-support', and why that would be timing out?

Code:
Mar 26 07:27:46 pve systemd[1]: Started 100.scope.
Mar 26 07:27:46 pve systemd-udevd[3040]: Using default interface naming scheme 'v240'.
Mar 26 07:27:46 pve systemd-udevd[3040]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Mar 26 07:27:46 pve systemd-udevd[3040]: Could not generate persistent MAC address for tap100i0: No such file or directory
Mar 26 07:27:46 pve kernel: [  717.670765] device tap100i0 entered promiscuous mode
Mar 26 07:27:46 pve systemd-udevd[3042]: Using default interface naming scheme 'v240'.
Mar 26 07:27:46 pve systemd-udevd[3042]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Mar 26 07:27:46 pve systemd-udevd[3042]: Could not generate persistent MAC address for fwbr100i0: No such file or directory
Mar 26 07:27:46 pve systemd-udevd[3040]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Mar 26 07:27:46 pve systemd-udevd[3041]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Mar 26 07:27:46 pve systemd-udevd[3041]: Using default interface naming scheme 'v240'.
Mar 26 07:27:46 pve systemd-udevd[3040]: Could not generate persistent MAC address for fwpr100p0: No such file or directory
Mar 26 07:27:46 pve systemd-udevd[3041]: Could not generate persistent MAC address for fwln100i0: No such file or directory
Mar 26 07:27:46 pve kernel: [  717.691759] fwbr100i0: port 1(fwln100i0) entered blocking state
Mar 26 07:27:46 pve kernel: [  717.691765] fwbr100i0: port 1(fwln100i0) entered disabled state
Mar 26 07:27:46 pve kernel: [  717.691828] device fwln100i0 entered promiscuous mode
Mar 26 07:27:46 pve kernel: [  717.691873] fwbr100i0: port 1(fwln100i0) entered blocking state
Mar 26 07:27:46 pve kernel: [  717.691875] fwbr100i0: port 1(fwln100i0) entered forwarding state
Mar 26 07:27:46 pve kernel: [  717.693901] vmbr0: port 3(fwpr100p0) entered blocking state
Mar 26 07:27:46 pve kernel: [  717.693904] vmbr0: port 3(fwpr100p0) entered disabled state
Mar 26 07:27:46 pve kernel: [  717.693934] device fwpr100p0 entered promiscuous mode
Mar 26 07:27:46 pve kernel: [  717.693949] vmbr0: port 3(fwpr100p0) entered blocking state
Mar 26 07:27:46 pve kernel: [  717.693950] vmbr0: port 3(fwpr100p0) entered forwarding state
Mar 26 07:27:46 pve kernel: [  717.695929] fwbr100i0: port 2(tap100i0) entered blocking state
Mar 26 07:27:46 pve kernel: [  717.695932] fwbr100i0: port 2(tap100i0) entered disabled state
Mar 26 07:27:46 pve kernel: [  717.695995] fwbr100i0: port 2(tap100i0) entered blocking state
Mar 26 07:27:46 pve kernel: [  717.695996] fwbr100i0: port 2(tap100i0) entered forwarding state
Mar 26 07:27:52 pve pvedaemon[1184]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - got timeout
Mar 26 07:27:55 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Mar 26 07:27:56 pve pvestatd[1165]: status update time (6.439 seconds)
Mar 26 07:27:58 pve pvedaemon[1185]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Mar 26 07:28:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 26 07:28:03 pve pveproxy[1194]: proxy detected vanished client connection
Mar 26 07:28:04 pve pvedaemon[1186]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 28 retries
Mar 26 07:28:04 pve pveproxy[1194]: proxy detected vanished client connection
Mar 26 07:28:04 pve pvedaemon[1185]: <root@pam> successful auth for user 'root@pam'
Mar 26 07:28:06 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 30 retries
Mar 26 07:28:08 pve pvestatd[1165]: status update time (9.087 seconds)
Mar 26 07:28:11 pve systemd[1]: Started Session 4 of user root.
Mar 26 07:28:11 pve systemd[1]: pvesr.service: Succeeded.
Mar 26 07:28:11 pve systemd[1]: Started Proxmox VE replication runner.
Mar 26 07:28:16 pve pvedaemon[3037]: start failed: command '/usr/bin/kvm -id 100 -name ubuntu-test-vm -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -dae
monize -smbios 'type=1,uuid=918ef9ae-cddb-41ce-b101-945a103a4d31' -smp '6,sockets=1,cores=6,maxcpus=6' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -cpu 'kvm64,enforce,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 16384 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -d
evice 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=4cfc98a5-82ae-4a0c-a2cf-ab7355be2f68' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'nec-usb-xhci,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=0000:08:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,x-vga=on,multifunction
=on' -device 'vfio-pci,host=0000:08:00.1,id=hostpci0.1,bus=pci.0,addr=0x10.1' -device 'usb-host,bus=xhci.0,hostbus=3,hostport=1.4,id=usb0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8a3cdd8d25e' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-0,if=none,id=drive-s
csi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=2A:C5:F7:88:2F:EF,netdev=net0,bus=pci.0,addr=0x12,
id=net0,bootindex=101' -machine 'type=pc+pve0'' failed: got timeout
Mar 26 07:28:16 pve pvedaemon[1185]: <root@pam> end task UPID:pve:00000BDD:00011825:605D8CF1:qmstart:100:root@pam: start failed: command '/usr/bin/kvm -id 100 -name ubuntu-test-vm -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -m
on 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=918ef9ae-cddb-41ce-b101-945a103a4d31' -smp '6,sockets=1,cores=6,maxcpus=6' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -cpu 'kvm64,enforce,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+s
ep' -m 16384 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=4cfc98a5-82ae-4a0c-a2cf-ab7355be2f68' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'nec-usb-xhci,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pc
i,host=0000:08:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,x-vga=on,multifunction=on' -device 'vfio-pci,host=0000:08:00.1,id=hostpci0.1,bus=pci.0,addr=0x10.1' -device 'usb-host,bus=xhci.0,hostbus=3,hostport=1.4,id=usb0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8a3cdd8d25e' -device 'virtio-scsi-pci,id=scsi
hw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on'
-device 'virtio-net-pci,mac=2A:C5:F7:88:2F:EF,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=101' -machine 'type=pc+pve0'' failed: got timeout
Mar 26 07:28:17 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Mar 26 07:28:18 pve pvestatd[1165]: status update time (9.411 seconds)
Mar 26 07:28:28 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Mar 26 07:28:31 pve pvedaemon[1184]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 21 retries
Mar 26 07:28:35 pve pvestatd[1165]: status update time (15.330 seconds)
Mar 26 07:28:35 pve pvedaemon[1186]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 24 retries
Mar 26 07:28:41 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Mar 26 07:28:42 pve pvestatd[1165]: status update time (7.669 seconds)
Mar 26 07:28:51 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Mar 26 07:28:52 pve pvestatd[1165]: status update time (8.061 seconds)
Mar 26 07:29:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 26 07:29:01 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
Mar 26 07:29:05 pve pvestatd[1165]: VM 101 qmp command failed - VM 101 qmp command 'balloon' failed - got timeout
Mar 26 07:29:06 pve pvestatd[1165]: VM 101 qmp command 'balloon' failed - got timeout
Mar 26 07:29:09 pve pvedaemon[1184]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - got timeout
Mar 26 07:29:25 pve pve-firewall[1158]: firewall update time (12.392 seconds)
Mar 26 07:29:36 pve pve-firewall[1158]: firewall update time (12.254 seconds)
Mar 26 07:29:43 pve pvestatd[1165]: status update time (47.584 seconds)
Mar 26 07:29:56 pve pve-firewall[1158]: firewall update time (18.729 seconds)
Mar 26 07:29:56 pve pvestatd[1165]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 12 retries
Mar 26 07:30:01 pve pvedaemon[1185]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - timeout after 13 retries
Mar 26 07:30:01 pve pvestatd[1165]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - got timeout
Mar 26 07:30:22 pve pveproxy[1194]: proxy detected vanished client connection
Mar 26 07:30:25 pve pvedaemon[1184]: <root@pam> successful auth for user 'root@pam'
Mar 26 07:30:33 pve pveproxy[1194]: proxy detected vanished client connection
Mar 26 07:30:49 pve pve-firewall[1158]: firewall update time (49.106 seconds)
Mar 26 07:30:50 pve pveproxy[1195]: proxy detected vanished client connection
Mar 26 07:30:53 pve systemd[1]: Starting Cleanup of Temporary Directories...
Mar 26 07:30:54 pve pvestatd[1165]: status update time (69.245 seconds)
Mar 26 07:31:49 pve systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Mar 26 07:32:09 pve systemd[1]: Started Cleanup of Temporary Directories.
 
Update - through sheer brute force fiddling with VM properties, the issue I'm facing seems to be some combination of 'balloon memory' setting in KVM, and passing through GPUs separately.

These things work:
  • Single VM with balloon memory on, and either or both GPUs passed-through
  • Two VMs with balloon memory, and only one with a passed-through GPU (the other with spice)
  • Two VMs with no balloon memory, each with their own GPU
This doesn't:
  • Two VMs with balloon memory, each with their own GPU
I have absolutely no idea why that should be. It's completely reproducible - the moment I turn on a second VM with a second GPU it will crash the host system if balloon memory is on. If it's not on, it works absolutely fine.
While I'm happy I've found a workaround, I'd still love to know how to get some low-level logging as to what's going wrong, and perhaps submit a qemu/kvm bug report.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!