We have once again a strange problem where random VM's get a 'qmp socket - timeout' when using the Console, or 'time out' on backups. The issue started to occur when we upgraded proxmox to the latest version.
We run all latest updates and run an NVME Ceph cluster.
Some VM's (KVM) work fine and console is accessible, but some do not on exactly the same hypervisor. It has nothing to do with the OS being installed on it, as we noticed this issue currently on both Linux and Windows.
Does anyone have any clue how to debug this and find the cause?
Starting a VM sometimes doesn't work either:
Trying to access the console also gives the timeout:
Backups also fail randomly:
We run all latest updates and run an NVME Ceph cluster.
Some VM's (KVM) work fine and console is accessible, but some do not on exactly the same hypervisor. It has nothing to do with the OS being installed on it, as we noticed this issue currently on both Linux and Windows.
Does anyone have any clue how to debug this and find the cause?
Starting a VM sometimes doesn't work either:
Code:
TASK ERROR: start failed: command '/usr/bin/kvm -id 192 -name telegram.rare.com -chardev 'socket,id=qmp,path=/var/run/qemu-server/192.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/192.pid -daemonize -smbios 'type=1,uuid=c4e1b094-d09c-49df-bae7-fa2c64fb848f' -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/192.vnc,password -no-hpet -cpu 'kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi,enforce' -m 4000 -object 'memory-backend-ram,id=ram-node0,size=4000M' -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:9617bc5c7589' -drive 'file=rbd:nvme01/vm-192-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/nvme01.keyring,if=none,id=drive-ide0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap192i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=1A:3A:22:6F:3A:FB,netdev=net0,bus=pci.0,addr=0x12,id=net0' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
Trying to access the console also gives the timeout:
Code:
()
VM 155 qmp command 'change' failed - unable to connect to VM 155 qmp socket - timeout after 599 retries
TASK ERROR: Failed to run vncproxy.
Backups also fail randomly:
Code:
VMID NAME STATUS TIME SIZE FILENAME
112 my.server1.com OK 00:00:54 1.46GB /mnt/pve/hyp08-backup/dump/vzdump-qemu-112-2020_02_18-05_00_02.vma.lzo
117 my.server2.com FAILED 00:00:10 got timeout
121 my.server3.com FAILED 00:00:13 got timeout
126 my.server4.com OK 00:03:13 16.46GB /mnt/pve/hyp08-backup/dump/vzdump-qemu-126-2020_02_18-05_01_19.vma.lzo