Hi everyone,
I've run into an issue that is kind of specific to what I'm using Proxmox for, so I'm not entirely sure if anyone will care to pursue the issue. I've found a work-around at this point, so my use case is 'fixed'. This may also be something for the qemu or kvm teams, so please guide me in the appropriate direction if necessary.
The Configuration:
VNC is enabled on each VM's /etc/pve/qemu-server/###.conf file as such:
args: -vnc 0.0.0.0:###
(Where ### is the VMID)
Per information here: https://pve.proxmox.com/wiki/VNC_Client_Access
QEMU-Agent is configured and enabled, for what that's worth.
Background/The Problem:
We are using Proxmox to host VMs for lab work where clients connect from Apache Guacamole. The issue is that if a session is established via the Proxmox NoVNC UI Client, and later a connection is established via Guacamole (using the VNC protocol and all defaults), the VM abruptly drops offline with a stack trace from KVM/Qemu-System.
This worked in previous versions of Proxmox (going from memory, maybe 6.3 or 6.4?). Upgrading to 7.1-10 introduced the problem. We upgraded (technically tore everything down, fresh installed, and moved our VMs) directly to 7.1-10, so I'm unsure exactly where the problem was introduced.
The Work-Around:
Enabling the 'Disable pasting from client' option in the Guacamole Connection Properties seems to fix the issue. I had originally noticed that enabling the 'Read-Only' connection option in Guacamole fixed the issue, so I guessed it had something to do with the input/output side of things rather than the video display and encoding options of VNC. This work-around is perfectly acceptable to me, but I feel like a VM abruptly dropping offline for any reason is cause for concern.
The Details:
I get the following errors logged in the syslog on Proxmox when the crash occurs:
I think the relevant lines include 'kvm' and 'segfault...', and then 'Unable to access opcode bytes at RIP...'. Based on that, and from some Fedora, KVM, and Qemu posts found elsewhere online, it appears better troubleshooting information (stack traces, debug info, etc) would be helpful, but I'm so far failing to make any of the suggested methods to do this work under Proxmox. I'd be happy to help by collecting relevant information if provided the proper guidance on doing-so.
Other things I've tried:
- Various encodings in the VNC configurations
- Different video adapters in the VM (guests are all Linux, generally Ubuntu 20.04)
- Various VNC Parameters in the ###.conf arg: line (based on tons of Google searching to find the various possible options)
- Duplicating the issue from a fresh VM on a fresh install of Proxmox with our existing Guacamole installation
I am able to reproduce this issue consistently in my test and production environments. So please let me know if I can provide further details.
Thanks!
Edit: Note, the version of Guacamole I'm testing with has not changed (still v1.30). My view is that unless something was deprecated on the Proxmox/Qemu/KVM side of things, that Guacamole should be operating as it was previously and would not have introduced this issue. This, in addition to the logs indicating KVM is crashing led me to start the process of reporting this issue from the Proxmox side rather than the Guacamole side of things.
I've run into an issue that is kind of specific to what I'm using Proxmox for, so I'm not entirely sure if anyone will care to pursue the issue. I've found a work-around at this point, so my use case is 'fixed'. This may also be something for the qemu or kvm teams, so please guide me in the appropriate direction if necessary.
The Configuration:
VNC is enabled on each VM's /etc/pve/qemu-server/###.conf file as such:
args: -vnc 0.0.0.0:###
(Where ### is the VMID)
Per information here: https://pve.proxmox.com/wiki/VNC_Client_Access
QEMU-Agent is configured and enabled, for what that's worth.
Background/The Problem:
We are using Proxmox to host VMs for lab work where clients connect from Apache Guacamole. The issue is that if a session is established via the Proxmox NoVNC UI Client, and later a connection is established via Guacamole (using the VNC protocol and all defaults), the VM abruptly drops offline with a stack trace from KVM/Qemu-System.
This worked in previous versions of Proxmox (going from memory, maybe 6.3 or 6.4?). Upgrading to 7.1-10 introduced the problem. We upgraded (technically tore everything down, fresh installed, and moved our VMs) directly to 7.1-10, so I'm unsure exactly where the problem was introduced.
The Work-Around:
Enabling the 'Disable pasting from client' option in the Guacamole Connection Properties seems to fix the issue. I had originally noticed that enabling the 'Read-Only' connection option in Guacamole fixed the issue, so I guessed it had something to do with the input/output side of things rather than the video display and encoding options of VNC. This work-around is perfectly acceptable to me, but I feel like a VM abruptly dropping offline for any reason is cause for concern.
The Details:
I get the following errors logged in the syslog on Proxmox when the crash occurs:
Code:
Apr 25 18:15:06 vmhost_name_redacted pvedaemon[569943]: starting vnc proxy UPID:vmhost_name_redacted:0008B257:108B42D4:62671D6A:vncproxy:101:user@pve:
Apr 25 18:15:06 vmhost_name_redacted pvedaemon[515946]: <user@pve> starting task UPID:vmhost_name_redacted:0008B257:108B42D4:62671D6A:vncproxy:101:user@pve:
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.018557] show_signal_msg: 8 callbacks suppressed
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.018559] kvm[569704]: segfault at 0 ip 0000000000000000 sp 00007ffef50a9798 error 14 in qemu-system-x86_64[55672d4a4000+3d9000]
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.018565] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Apr 25 18:15:10 vmhost_name_redacted pvedaemon[515946]: <user@pve> end task UPID:vmhost_name_redacted:0008B257:108B42D4:62671D6A:vncproxy:101:user@pve: OK
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.087707] fwbr101i0: port 2(tap101i0) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.088018] fwbr101i0: port 2(tap101i0) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.168055] vmbr1: port 2(tap101i1) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted kernel: [2775573.168346] vmbr1: port 2(tap101i1) entered disabled state
Apr 25 18:15:10 vmhost_name_redacted systemd[1]: 101.scope: Succeeded.
Apr 25 18:15:10 vmhost_name_redacted systemd[1]: 101.scope: Consumed 17.213s CPU time.
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[569983]: starting vnc proxy UPID:vmhost_name_redacted:0008B27F:108B4499:62671D6E:vncproxy:101:user@pve:
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[510530]: <user@pve> starting task UPID:vmhost_name_redacted:0008B27F:108B4499:62671D6E:vncproxy:101:user@pve:
Apr 25 18:15:11 vmhost_name_redacted qmeventd[569981]: Starting cleanup for 101
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690114] fwbr101i0: port 1(fwln101i0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690179] vmbr0: port 4(fwpr101p0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690247] device fwln101i0 left promiscuous mode
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.690248] fwbr101i0: port 1(fwln101i0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.715555] device fwpr101p0 left promiscuous mode
Apr 25 18:15:11 vmhost_name_redacted kernel: [2775573.715558] vmbr0: port 4(fwpr101p0) entered disabled state
Apr 25 18:15:11 vmhost_name_redacted qmeventd[569981]: Finished cleanup for 101
Apr 25 18:15:11 vmhost_name_redacted qm[569987]: VM 101 qmp command failed - VM 101 not running
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[569983]: Failed to run vncproxy.
Apr 25 18:15:11 vmhost_name_redacted pvedaemon[510530]: <user@pve> end task UPID:vmhost_name_redacted:0008B27F:108B4499:62671D6E:vncproxy:101:user@pve: Failed to run vncproxy.
I think the relevant lines include 'kvm' and 'segfault...', and then 'Unable to access opcode bytes at RIP...'. Based on that, and from some Fedora, KVM, and Qemu posts found elsewhere online, it appears better troubleshooting information (stack traces, debug info, etc) would be helpful, but I'm so far failing to make any of the suggested methods to do this work under Proxmox. I'd be happy to help by collecting relevant information if provided the proper guidance on doing-so.
Other things I've tried:
- Various encodings in the VNC configurations
- Different video adapters in the VM (guests are all Linux, generally Ubuntu 20.04)
- Various VNC Parameters in the ###.conf arg: line (based on tons of Google searching to find the various possible options)
- Duplicating the issue from a fresh VM on a fresh install of Proxmox with our existing Guacamole installation
I am able to reproduce this issue consistently in my test and production environments. So please let me know if I can provide further details.
Thanks!
Edit: Note, the version of Guacamole I'm testing with has not changed (still v1.30). My view is that unless something was deprecated on the Proxmox/Qemu/KVM side of things, that Guacamole should be operating as it was previously and would not have introduced this issue. This, in addition to the logs indicating KVM is crashing led me to start the process of reporting this issue from the Proxmox side rather than the Guacamole side of things.
Last edited: