VM crash on console attempt

gradinaruvasile

Well-Known Member
Oct 22, 2015
83
11
48
Hi,

I observer a weird issue on Proxmox related to the web bases VNC console.
Sometimes when i click on a Windows 2016 VM's VNC console, the VM crashes.
This happens on a server with lateish updates, but doesn't on another that is in the same cluster but a bit behind.
I supposed it is maybe an AMD issue since the server with the latest updates had an EPYC CPU (the server runs mighty fine, it trounces the Dell in I/O for some reason despite having the same network connectivity), but i saw on Reddit a thread by someone else who had this happen (on Windows 2012) on a Dell R610 which is Intel based.
He had running proxmox-ve 5.2.2 with kernel 4.15.18-8.
My HP server has proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve) and the Dell 710 has proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve). The Dell 710 has no such issues.

So anyone has any idea why is this happening and how can it be debugged?

Debug info:
Ran 'pveversion --verbose' on my servers.
Problematic server: HP DL385 G10 (single EPYC 7281 CPU):

Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

"Good server" Dell 710:

Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve)
pve-manager: 5.2-11 (running version: 5.2-11/13c2da63)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.4.128-1-pve: 4.4.128-111
pve-kernel-4.4.98-3-pve: 4.4.98-103
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.2.3-2-pve: 4.2.3-22
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-1
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-10
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-40
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
 
Sometimes when i click on a Windows 2016 VM's VNC console, the VM crashes.
So anyone has any idea why is this happening and how can it be debugged?

Hmm, sounds quite strange. Anything in the windows event log around this time, or in the hosts syslog / dmesg output?

Do you ensured that you have all variable frequency settings off in the firmware? Some power saving features also made some problems in the past, but a bit hard to tell for such a strange issue.
 
Nothing there...

Edit:

It happened on the Dell server too. Even on Windows 10. It is not reproducible 100% unfortunately.
 
Last edited:
Had this happen on a 2012 R2 Server vm.

Syslog shows a connection reset from peer message from what I'm assuming is the connecting client at time of the vm crash on console, but nothing else.

Dell R610 Rev II here.
 
Are there any updates on this??

Just hit this issue twice within a 1 minute time period. Brought down two Windows servers during updates(assuming based off of configured wsus windows). Currently running Proxmox in a lab environment to experiment with it and try it out, and this is very off-putting to making a business decision of putting Proxmox on to a production system if it can't reliably do something as simple as consoling without causing a system to crash.

Code:
Mar 10 11:40:48 pve-host-a pvedaemon[3479]: <root@pam> starting task UPID:pve-host-a:0000608C:001B0BA3:5C853000:vncproxy:118:root@pam:
Mar 10 11:40:48 pve-host-a pvedaemon[24716]: starting vnc proxy UPID:pve-host-a:0000608C:001B0BA3:5C853000:vncproxy:118:root@pam:
Mar 10 11:40:54 pve-host-a pveproxy[3558]: problem with client 192.168.1.1; Connection reset by peer
Mar 10 11:40:54 pve-host-a pvedaemon[3479]: <root@pam> end task UPID:pve-host-a:0000608C:001B0BA3:5C853000:vncproxy:118:root@pam: OK
Mar 10 11:40:55 pve-host-a pvedaemon[25260]: starting vnc proxy UPID:pve-host-a:000062AC:001B0E2B:5C853007:vncproxy:118:root@pam:
Mar 10 11:40:55 pve-host-a pvedaemon[3477]: <root@pam> starting task UPID:pve-host-a:000062AC:001B0E2B:5C853007:vncproxy:118:root@pam:
Mar 10 11:40:55 pve-host-a kernel: [17730.267425] vmbr0v1291: port 5(tap118i0) entered disabled state
Mar 10 11:40:55 pve-host-a kernel: [17730.267794] vmbr0v1291: port 5(tap118i0) entered disabled state
Mar 10 11:40:56 pve-host-a qmeventd[2625]: Starting cleanup for 118
Mar 10 11:40:56 pve-host-a qmeventd[2625]: Finished cleanup for 118
Mar 10 11:40:56 pve-host-a qm[25262]: VM 118 qmp command failed - VM 118 not running
Mar 10 11:40:56 pve-host-a pvedaemon[25260]: Failed to run vncproxy.
Mar 10 11:40:56 pve-host-a pvedaemon[3477]: <root@pam> end task UPID:pve-host-a:000062AC:001B0E2B:5C853007:vncproxy:118:root@pam: Failed to run vncproxy.

Code:
Mar 10 11:41:23 pve-host-a pvedaemon[26543]: starting vnc proxy UPID:pve-host-a:000067AF:001B1923:5C853023:vncproxy:121:root@pam:
Mar 10 11:41:23 pve-host-a pvedaemon[3477]: <root@pam> starting task UPID:pve-host-a:000067AF:001B1923:5C853023:vncproxy:121:root@pam:
Mar 10 11:41:25 pve-host-a pvedaemon[3477]: <root@pam> end task UPID:pve-host-a:000067AF:001B1923:5C853023:vncproxy:121:root@pam: OK
Mar 10 11:41:25 pve-host-a kernel: [17760.242109] vmbr0v1291: port 8(tap121i0) entered disabled state
Mar 10 11:41:25 pve-host-a kernel: [17760.242632] vmbr0v1291: port 8(tap121i0) entered disabled state
Mar 10 11:41:25 pve-host-a pvedaemon[26903]: starting vnc proxy UPID:pve-host-a:00006917:001B19EC:5C853025:vncproxy:121:root@pam:
Mar 10 11:41:25 pve-host-a pvedaemon[3479]: <root@pam> starting task UPID:pve-host-a:00006917:001B19EC:5C853025:vncproxy:121:root@pam:
Mar 10 11:41:26 pve-host-a qm[27350]: VM 121 qmp command failed - VM 121 not running
Mar 10 11:41:26 pve-host-a pvedaemon[26903]: Failed to run vncproxy.
Mar 10 11:41:26 pve-host-a qmeventd[2625]: Starting cleanup for 121
Mar 10 11:41:26 pve-host-a pvedaemon[3479]: <root@pam> end task UPID:pve-host-a:00006917:001B19EC:5C853025:vncproxy:121:root@pam: Failed to run vncproxy.
Mar 10 11:41:26 pve-host-a qmeventd[2625]: Finished cleanup for 121

Code:
[16838.344202] device tap109i0 entered promiscuous mode
[16838.369975] vmbr0v1291: port 4(tap109i0) entered blocking state
[16838.369978] vmbr0v1291: port 4(tap109i0) entered disabled state
[16838.370097] vmbr0v1291: port 4(tap109i0) entered blocking state
[16838.370099] vmbr0v1291: port 4(tap109i0) entered forwarding state
[16843.255705] device tap118i0 entered promiscuous mode
[16843.279064] vmbr0v1291: port 5(tap118i0) entered blocking state
[16843.279067] vmbr0v1291: port 5(tap118i0) entered disabled state
[16843.279229] vmbr0v1291: port 5(tap118i0) entered blocking state
[16843.279232] vmbr0v1291: port 5(tap118i0) entered forwarding state
[16844.188795] device tap119i0 entered promiscuous mode
[16844.217236] vmbr0v1291: port 6(tap119i0) entered blocking state
[16844.217240] vmbr0v1291: port 6(tap119i0) entered disabled state
[16844.217395] vmbr0v1291: port 6(tap119i0) entered blocking state
[16844.217398] vmbr0v1291: port 6(tap119i0) entered forwarding state
[16846.440860] device tap120i0 entered promiscuous mode
[16846.462291] vmbr0v1291: port 7(tap120i0) entered blocking state
[16846.462293] vmbr0v1291: port 7(tap120i0) entered disabled state
[16846.462459] vmbr0v1291: port 7(tap120i0) entered blocking state
[16846.462461] vmbr0v1291: port 7(tap120i0) entered forwarding state
[16848.035970] device tap121i0 entered promiscuous mode
[16848.056046] vmbr0v1291: port 8(tap121i0) entered blocking state
[16848.056049] vmbr0v1291: port 8(tap121i0) entered disabled state
[16848.056169] vmbr0v1291: port 8(tap121i0) entered blocking state
[16848.056171] vmbr0v1291: port 8(tap121i0) entered forwarding state
[16852.926259] device tap102i0 entered promiscuous mode
[16852.967496] vmbr0v1290: port 5(tap102i0) entered blocking state
[16852.967501] vmbr0v1290: port 5(tap102i0) entered disabled state
[16852.967723] vmbr0v1290: port 5(tap102i0) entered blocking state
[16852.967728] vmbr0v1290: port 5(tap102i0) entered forwarding state
[16854.293331] device tap103i0 entered promiscuous mode
[16854.328445] vmbr0v1290: port 6(tap103i0) entered blocking state
[16854.328449] vmbr0v1290: port 6(tap103i0) entered disabled state
[16854.328630] vmbr0v1290: port 6(tap103i0) entered blocking state
[16854.328633] vmbr0v1290: port 6(tap103i0) entered forwarding state
[16856.242984] device tap104i0 entered promiscuous mode
[16856.272987] vmbr0v1291: port 9(tap104i0) entered blocking state
[16856.272992] vmbr0v1291: port 9(tap104i0) entered disabled state
[16856.273218] vmbr0v1291: port 9(tap104i0) entered blocking state
[16856.273222] vmbr0v1291: port 9(tap104i0) entered forwarding state
[17730.267425] vmbr0v1291: port 5(tap118i0) entered disabled state
[17730.267794] vmbr0v1291: port 5(tap118i0) entered disabled state
[17760.242109] vmbr0v1291: port 8(tap121i0) entered disabled state
[17760.242632] vmbr0v1291: port 8(tap121i0) entered disabled state

Code:
root@pve-host-a:/var/log# pveversion
pve-manager/5.3-9/ba817b29 (running kernel: 4.15.18-11-pve)
 
I personally have no experience with current AMD server class CPUs but I never experienced such an error on Intel-based DL380/360 machines from various generations. Unfortunately, your log does not show any information, because your guest vm crashed, so you need logs from your guest, not from your hypervisor. What display settings have you chosen for the VM and have you installed any drivers inside of your guest?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!