Hi all,
we are running Proxmox on a single server (no Cluster) and VMs are stored locally on an LVM storage.
Currently the system is running ~10-12 VMs. Most of them are Linux server systems, but we are also running two Windows machines.
During everyday use, all VMs were running fine and no issues occurred (SSH, HTTP, SMB, RDP were all behaving as expected).
Today, my colleague tried to boot up a Windows VM which had been offline for some weeks. He realized he could not get an RDP connection and so he tried to access the machine via the Proxmox noVNC console.
However, this was not working either. The Console tab shows "Verbinden..." (I suspect the English translation would be "Connecting...") and after a while the following error appears:
This is when I got involved into this issue (I am sort of the main administrator of the Proxmox instance).
So far I found out / tried the following:
- For almost all of the VMs the noVNC console is not working.
- For some non-critical systems I tried to initiate a shutdown via the Web GUI, as well as via
- I logged into some of the Linux VMs via SSH and initiated a shutdown. I then restarted these VMs via the Web GUI and they are now working as expected (noVNC is showing, I can now initiate a shutdown too).
- I tried the same for one of the (up to that point still working) Windows VMs. It does not come back online, is neither accessible via RDP nor noVNC. While a start of the Windows VM seems successful via the Web GUI, starting it via
- The problem also occurs if I try to create a new Windows VM from scratch. Not even the initial boot into the installer ISO file is working.
- The problem does not occur if I try to create a new Linux VM from scratch. The installer loads perfectly as expected.
- I installed the latest packages (
- I did not yet restart the hypervisor itself. One of the Windows VMs is still running and I would prefer not to lose access to it too, as it is in daily use.
- I searched for the issue on the Proxmox forum and found this thread which describes a similar behaviour regarding the "Failed to run vncproxy" part. However, the issue described there has been resolved with an upgrade to
What could have led to this error?
A week ago I updated the packages on the Proxmox instance. After that, I did not reboot the hypervisor. As we are not using the noVNC feature on a daily base it could very well be that one of the updates led to the problem and has remained undetected until today.
However, I have one Linux VM with an uptime of 6 days (rebooted right after the update of Proxmox) which is working perfectly, while another Linux VM with an uptime of 4 days (rebooted multiple days after the update) is not working.
Likely not related, but something I changed recently:
I introduced named admin accounts for my colleagues and me and deactivated the default root account. This had some unexpected side effects such as updates no longer being possible via the Web GUI. Therefore, I updated via the CLI using
Information on our system:
qm config for one of the Windows VMs which are not working (removing the ISO file does not change anything):
I am out of ideas and hope that someone might be able to help. If required, I could organize a reboot of the hypervisor relatively quickly. A short downtime of the VMs is no problem but I am afraid that we will then be without any working Windows VM.
Regards,
Phil
we are running Proxmox on a single server (no Cluster) and VMs are stored locally on an LVM storage.
Currently the system is running ~10-12 VMs. Most of them are Linux server systems, but we are also running two Windows machines.
During everyday use, all VMs were running fine and no issues occurred (SSH, HTTP, SMB, RDP were all behaving as expected).
Today, my colleague tried to boot up a Windows VM which had been offline for some weeks. He realized he could not get an RDP connection and so he tried to access the machine via the Proxmox noVNC console.
However, this was not working either. The Console tab shows "Verbinden..." (I suspect the English translation would be "Connecting...") and after a while the following error appears:
Code:
VM 114 qmp command 'change' failed - unable to connect to VM 114 qmp socket - timeout after 600 retries
TASK ERROR: Failed to run vncproxy.
This is when I got involved into this issue (I am sort of the main administrator of the Proxmox instance).
So far I found out / tried the following:
- For almost all of the VMs the noVNC console is not working.
- For some non-critical systems I tried to initiate a shutdown via the Web GUI, as well as via
qm shutdown <vmid>
. The systems with a non-working noVNC console cannot be shutdown this way. The systems shows VM quit/powerdown failed
.- I logged into some of the Linux VMs via SSH and initiated a shutdown. I then restarted these VMs via the Web GUI and they are now working as expected (noVNC is showing, I can now initiate a shutdown too).
- I tried the same for one of the (up to that point still working) Windows VMs. It does not come back online, is neither accessible via RDP nor noVNC. While a start of the Windows VM seems successful via the Web GUI, starting it via
qm start <vmid>
shows
Code:
start failed: command '/usr/bin/kvm -id 118 -name Office-36 -chardev 'socket,id=qmp,path=/var/run/qemu-server/118.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/118.pid -daemonize -smbios 'type=1,uuid=0bb73145-4d1e-451a-9114-e681e89baeea' -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/118.vnc,password -no-hpet -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 3072 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=c026deef-a4ff-4772-8028-487a4e00c961' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:4aa886956b8e' -drive 'file=/dev/pve/vm-118-disk-0,if=none,id=drive-ide0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap118i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=82:93:FB:78:4C:09,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
- The problem does not occur if I try to create a new Linux VM from scratch. The installer loads perfectly as expected.
- I installed the latest packages (
apt update && apt upgrade
) but nothing changed.- I did not yet restart the hypervisor itself. One of the Windows VMs is still running and I would prefer not to lose access to it too, as it is in daily use.
- I searched for the issue on the Proxmox forum and found this thread which describes a similar behaviour regarding the "Failed to run vncproxy" part. However, the issue described there has been resolved with an upgrade to
pve-qemu-kvm 4.0.0-7
. We are running pve-qemu-kvm/stable 5.0.0-11 amd64
What could have led to this error?
A week ago I updated the packages on the Proxmox instance. After that, I did not reboot the hypervisor. As we are not using the noVNC feature on a daily base it could very well be that one of the updates led to the problem and has remained undetected until today.
However, I have one Linux VM with an uptime of 6 days (rebooted right after the update of Proxmox) which is working perfectly, while another Linux VM with an uptime of 4 days (rebooted multiple days after the update) is not working.
Likely not related, but something I changed recently:
I introduced named admin accounts for my colleagues and me and deactivated the default root account. This had some unexpected side effects such as updates no longer being possible via the Web GUI. Therefore, I updated via the CLI using
apt
. To rule this out as a possible reason I reactivated the root account today. The issues persist.Information on our system:
Code:
root@aphrodite:/var/log# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.0.21-1-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-1
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
qm config for one of the Windows VMs which are not working (removing the ISO file does not change anything):
Code:
root@aphrodite:/var/log# qm config 114
bootdisk: ide0
cores: 2
ide0: local-lvm:vm-114-disk-0,size=45G
ide2: local:iso/SW_DVD5_Office_Professional_Plus_2019_32_BIT_X64_German_C2R_X21-84630.ISO,media=cdrom,size=3431288K
memory: 3072
name: REDACTED
net0: e1000=02:49:BE:86:7B:F0,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
parent: REDACTED
scsihw: virtio-scsi-pci
smbios1: uuid=c5639d23-a61e-4fa7-bcd2-6e6c7c546fe3
sockets: 1
vmgenid: 200b9229-0fb8-4357-a4f9-664522104366
I am out of ideas and hope that someone might be able to help. If required, I could organize a reboot of the hypervisor relatively quickly. A short downtime of the VMs is no problem but I am afraid that we will then be without any working Windows VM.
Regards,
Phil