Specific Simultaneous VNC Connections to Proxmox VM causing VM to abruptly crash

Jun 18, 2020
31
5
13
47
Hi everyone,

I've run into an issue that is kind of specific to what I'm using Proxmox for, so I'm not entirely sure if anyone will care to pursue the issue. I've found a work-around at this point, so my use case is 'fixed'. This may also be something for the qemu or kvm teams, so please guide me in the appropriate direction if necessary.

The Configuration:
VNC is enabled on each VM's /etc/pve/qemu-server/###.conf file as such:
args: -vnc 0.0.0.0:###
(Where ### is the VMID)
Per information here: https://pve.proxmox.com/wiki/VNC_Client_Access
QEMU-Agent is configured and enabled, for what that's worth.

Background/The Problem:
We are using Proxmox to host VMs for lab work where clients connect from Apache Guacamole. The issue is that if a session is established via the Proxmox NoVNC UI Client, and later a connection is established via Guacamole (using the VNC protocol and all defaults), the VM abruptly drops offline with a stack trace from KVM/Qemu-System.

This worked in previous versions of Proxmox (going from memory, maybe 6.3 or 6.4?). Upgrading to 7.1-10 introduced the problem. We upgraded (technically tore everything down, fresh installed, and moved our VMs) directly to 7.1-10, so I'm unsure exactly where the problem was introduced.

The Work-Around:
Enabling the 'Disable pasting from client' option in the Guacamole Connection Properties seems to fix the issue. I had originally noticed that enabling the 'Read-Only' connection option in Guacamole fixed the issue, so I guessed it had something to do with the input/output side of things rather than the video display and encoding options of VNC. This work-around is perfectly acceptable to me, but I feel like a VM abruptly dropping offline for any reason is cause for concern.

The Details:
I get the following errors logged in the syslog on Proxmox when the crash occurs:
Code:
Apr 25 18:15:06 vmhost_name_redacted pvedaemon[569943]: starting vnc proxy UPID:vmhost_name_redacted:0008B257:108B42D4:62671D6A:vncproxy:101:user@pve:
Apr 25 18:15:06 vmhost_name_redacted pvedaemon[515946]: <user@pve> starting task UPID:vmhost_name_redacted:0008B257:108B42D4:62671D6A:vncproxy:101:user@pve:
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.018557] show_signal_msg: 8 callbacks suppressed
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.018559] kvm[569704]: segfault at 0 ip 0000000000000000 sp 00007ffef50a9798 error 14 in qemu-system-x86_64[55672d4a4000+3d9000]
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.018565] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Apr 25 18:15:10 vmhost_name_redacted pvedaemon[515946]: <user@pve> end task UPID:vmhost_name_redacted:0008B257:108B42D4:62671D6A:vncproxy:101:user@pve: OK
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.087707] fwbr101i0: port 2(tap101i0) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.088018] fwbr101i0: port 2(tap101i0) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.168055] vmbr1: port 2(tap101i1) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.168346] vmbr1: port 2(tap101i1) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted systemd[1]: 101.scope: Succeeded.
Apr 25 18:15:10 vmhost_name_redacted systemd[1]: 101.scope: Consumed 17.213s CPU time.
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[569983]: starting vnc proxy UPID:vmhost_name_redacted:0008B27F:108B4499:62671D6E:vncproxy:101:user@pve:
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[510530]: <user@pve> starting task UPID:vmhost_name_redacted:0008B27F:108B4499:62671D6E:vncproxy:101:user@pve:
Apr 25 18:15:11 vmhost_name_redacted qmeventd[569981]: Starting cleanup for 101
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690114] fwbr101i0: port 1(fwln101i0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690179] vmbr0: port 4(fwpr101p0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690247] device fwln101i0 left promiscuous mode
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690248] fwbr101i0: port 1(fwln101i0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.715555] device fwpr101p0 left promiscuous mode
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.715558] vmbr0: port 4(fwpr101p0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted qmeventd[569981]: Finished cleanup for 101
Apr 25 18:15:11 vmhost_name_redacted qm[569987]: VM 101 qmp command failed - VM 101 not running
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[569983]: Failed to run vncproxy.
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[510530]: <user@pve> end task UPID:vmhost_name_redacted:0008B27F:108B4499:62671D6E:vncproxy:101:user@pve: Failed to run vncproxy.

I think the relevant lines include 'kvm' and 'segfault...', and then 'Unable to access opcode bytes at RIP...'. Based on that, and from some Fedora, KVM, and Qemu posts found elsewhere online, it appears better troubleshooting information (stack traces, debug info, etc) would be helpful, but I'm so far failing to make any of the suggested methods to do this work under Proxmox. I'd be happy to help by collecting relevant information if provided the proper guidance on doing-so.

Other things I've tried:
- Various encodings in the VNC configurations
- Different video adapters in the VM (guests are all Linux, generally Ubuntu 20.04)
- Various VNC Parameters in the ###.conf arg: line (based on tons of Google searching to find the various possible options)
- Duplicating the issue from a fresh VM on a fresh install of Proxmox with our existing Guacamole installation

I am able to reproduce this issue consistently in my test and production environments. So please let me know if I can provide further details.

Thanks!

Edit: Note, the version of Guacamole I'm testing with has not changed (still v1.30). My view is that unless something was deprecated on the Proxmox/Qemu/KVM side of things, that Guacamole should be operating as it was previously and would not have introduced this issue. This, in addition to the logs indicating KVM is crashing led me to start the process of reporting this issue from the Proxmox side rather than the Guacamole side of things. :)
 
Last edited:
Thank you for reporting this!

Does it happen with just Guacamole as well, or do you need both NoVNC and Guacamole to be connected at the same time?

Could you start the VM in the foreground by getting the command with qm showcmd <VMID> --pretty, removing the -daemonize \ line and then starting it?
Maybe this way there will be more information regarding the segfault.
 
Guacamole itself generally works fine, as long as I stay away from using the Proxmox UI at the same time. Additionally (and weirdly), other VNC clients (Specifically the TigerVNC Mac OS client) seem to co-exist and can connect at the same time perfectly fine.

I launched via a temporary BASH script, this was the output once I made the connection and crashed the VM:
Code:
root@vmhost_name_redacted:~# /tmp/101_launch.sh
/tmp/101_launch.sh: line 38: 765587 Segmentation fault      /usr/bin/kvm -id 101 -name ubuntu2004 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/101.pid -smbios 'type=1,uuid=09ce2ff0-dda1-421b-946b-e610c9756082' -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/101.vnc,password=on' -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=5f0bd539-9959-49e5-ae8b-b800ac97c791' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-vga,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/101.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e34dd9a3cdcb' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -drive 'file=/var/lib/vz/images/101/vm-101-disk-0.qcow2,if=none,id=drive-virtio0,cache=writethrough,discard=on,format=qcow2,aio=io_uring,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=CA:5D:F0:B9:06:36,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -netdev 'type=tap,id=net1,ifname=tap101i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=92:13:0A:6E:88:98,netdev=net1,bus=pci.0,addr=0x13,id=net1' -machine 'type=pc+pve0' -vnc 0.0.0.0:101

It looks like line 38 is just the last line of the temporary BASH file:
Bash:
  1 #!/bin/bash
  2 /usr/bin/kvm \
  3   -id 101 \
  4   -name ubuntu2004 \
  5   -no-shutdown \
  6   -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server=on,wait=off' \
  7   -mon 'chardev=qmp,mode=control' \
  8   -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  9   -mon 'chardev=qmp-event,mode=control' \
 10   -pidfile /var/run/qemu-server/101.pid \
 11   -smbios 'type=1,uuid=09ce2ff0-dda1-421b-946b-e610c9756082' \
 12   -smp '2,sockets=1,cores=2,maxcpus=2' \
 13   -nodefaults \
 14   -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
 15   -vnc 'unix:/var/run/qemu-server/101.vnc,password=on' \
 16   -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
 17   -m 4096 \
 18   -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
 19   -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
 20   -device 'vmgenid,guid=5f0bd539-9959-49e5-ae8b-b800ac97c791' \
 21   -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
 22   -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
 23   -device 'virtio-vga,id=vga,bus=pci.0,addr=0x2' \
 24   -chardev 'socket,path=/var/run/qemu-server/101.qga,server=on,wait=off,id=qga0' \
 25   -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' \
 26   -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' \
 27   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
 28   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e34dd9a3cdcb' \
 29   -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
 30   -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' \
 31   -drive 'file=/var/lib/vz/images/101/vm-101-disk-0.qcow2,if=none,id=drive-virtio0,cache=writethrough,discard=on,for    mat=qcow2,aio=io_uring,detect-zeroes=unmap' \
 32   -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' \
 33   -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/p    ve-bridgedown,vhost=on' \
 34   -device 'virtio-net-pci,mac=CA:5D:F0:B9:06:36,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' \
 35   -netdev 'type=tap,id=net1,ifname=tap101i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/p    ve-bridgedown,vhost=on' \
 36   -device 'virtio-net-pci,mac=92:13:0A:6E:88:98,netdev=net1,bus=pci.0,addr=0x13,id=net1' \
 37   -machine 'type=pc+pve0' \
 38   -vnc 0.0.0.0:101
 
Could you install the debug package for pve-qemu-kvm? apt install pve-qemu-kvm-dbg
And then run it inside gdb. Once it segfaults, please provide both a backtrace (bt) and to be safe a backtrace of all threads (thread apply all bt).
 
Hi mira, here are the results:

From gdb bt:
Code:
#0  0x0000000000000000 in ?? ()
#1  0x0000560bb69164dd in protocol_client_msg (vs=0x560bb9c334e0, data=0x560bb96f0620 "\006", len=12) at ../ui/vnc.c:2459
#2  0x0000560bb6913a77 in vnc_client_read (vs=0x560bb9c334e0) at ../ui/vnc.c:1621
#3  vnc_client_io (ioc=<optimized out>, opaque=0x560bb9c334e0, condition=G_IO_IN) at ../ui/vnc.c:1649
#4  vnc_client_io (ioc=<optimized out>, condition=G_IO_IN, opaque=0x560bb9c334e0) at ../ui/vnc.c:1636
#5  0x00007f3fc28ced6f in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6  0x0000560bb6e2ef40 in glib_pollfds_poll () at ../util/main-loop.c:232
#7  os_host_main_loop_wait (timeout=24138073) at ../util/main-loop.c:255
#8  main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:531
#9  0x0000560bb6bc7ce1 in qemu_main_loop () at ../softmmu/runstate.c:726
#10 0x0000560bb68f70ee in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50

I've attached the full backtrace.
 

Attachments

  • Thread 5 Thread 0x7f3ea5fbf700 - Full Backtrace.txt
    12.1 KB · Views: 2
Thank you for the backtrace!

This helped narrow down the issue. It's VNC clipboard which segfaults.
Clipboard support was only introduced in QEMU 6.1.0. That explains why it only started to happen after upgrading to PVE 7.1.

Have you tried the other way around, first opening Guacamole and then opening the NoVNC console?
 
Last edited:
Hi mira,

Yes, opening Guacamole and then the NoVNC console seems to work properly.

Thanks!

Edit: However, even after this, if the Guacamole session is reconnected in any way (even sometimes switching browser tabs), the issue occurs.
 
Last edited:
Unfortunately i run into a very similar issue except i'm just running vnc through remmina vs. novnc and it happens very unreliably instead of predictably. But gives similar errors. Unfortunately may just have to turn off clipboard sharing... pity
```
[3817118.422625] kvm[819591]: segfault at 0 ip 0000000000000000 sp 00007ffc007512f8 error 14 in qemu-system-x86_64[55613af3d000+31d000] likely on CPU 74 (core 27, socket 0)[3817118.422657] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
```
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!