PveDaemon SegFault / VNCProxy

May 18, 2024
32
3
8
Original message here

https://forum.proxmox.com/threads/proxmox-ve-8-3-released.157793/post-724494

- working on vm 201:

Code:
Nov 27 15:20:15 pve pvedaemon[59586]: worker exit
Nov 27 15:20:15 pve pvedaemon[4943]: worker 59586 finished
Nov 27 15:20:15 pve pvedaemon[4943]: starting 1 worker(s)
Nov 27 15:20:15 pve pvedaemon[4943]: worker 92736 started
Nov 27 15:21:14 pve pvedaemon[58534]: worker exit
Nov 27 15:21:14 pve pvedaemon[4943]: worker 58534 finished
Nov 27 15:21:14 pve pvedaemon[4943]: starting 1 worker(s)
Nov 27 15:21:14 pve pvedaemon[4943]: worker 93327 started
Nov 27 15:22:15 pve kernel:  zd320: p1 p2 p3
Nov 27 15:22:15 pve lvm[93987]: /dev/zd320p3 excluded: device is rejected by filter config.
Nov 27 15:22:15 pve ovs-vsctl[93994]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln201i0
Nov 27 15:22:15 pve ovs-vsctl[93994]: ovs|00002|db_ctl_base|ERR|no port named fwln201i0
Nov 27 15:22:15 pve ovs-vsctl[93995]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap201i0
Nov 27 15:22:15 pve qmeventd[4162]: read: Connection reset by peer
Nov 27 15:22:15 pve pvedaemon[92736]: VM 201 qmp command failed - VM 201 not running
Nov 27 15:22:15 pve systemd[1]: 201.scope: Deactivated successfully.
Nov 27 15:22:15 pve systemd[1]: 201.scope: Consumed 6min 35.090s CPU time.
Nov 27 15:22:15 pve qmeventd[94001]: Starting cleanup for 201
Nov 27 15:22:15 pve qmeventd[94001]: Finished cleanup for 201
Nov 27 15:23:29 pve pvedaemon[85885]: <root@pam> starting task UPID:pve:000171F3:02857A43:67471D51:vncproxy:802:root@pam:
Nov 27 15:23:29 pve pvedaemon[94707]: starting vnc proxy UPID:pve:000171F3:02857A43:67471D51:vncproxy:802:root@pam:
Nov 27 15:23:31 pve pvedaemon[85885]: <root@pam> end task UPID:pve:000171F3:02857A43:67471D51:vncproxy:802:root@pam: OK
Nov 27 15:23:31 pve pvedaemon[94714]: starting vnc proxy UPID:pve:000171FA:02857B52:67471D53:vncproxy:401:root@pam:
Nov 27 15:23:31 pve pvedaemon[85885]: <root@pam> starting task UPID:pve:000171FA:02857B52:67471D53:vncproxy:401:root@pam:
Nov 27 15:23:33 pve pvedaemon[85885]: <root@pam> end task UPID:pve:000171FA:02857B52:67471D53:vncproxy:401:root@pam: OK
Nov 27 15:23:34 pve pvedaemon[92736]: <root@pam> starting task UPID:pve:00017244:02857C2A:67471D56:vncproxy:310:root@pam:
Nov 27 15:23:34 pve pvedaemon[94788]: starting vnc proxy UPID:pve:00017244:02857C2A:67471D56:vncproxy:310:root@pam:
Nov 27 15:23:35 pve pvedaemon[92736]: <root@pam> end task UPID:pve:00017244:02857C2A:67471D56:vncproxy:310:root@pam: OK
Nov 27 15:23:35 pve pvedaemon[94798]: starting vnc proxy UPID:pve:0001724E:02857CE3:67471D57:vncproxy:300:root@pam:
Nov 27 15:23:35 pve pvedaemon[92736]: <root@pam> starting task UPID:pve:0001724E:02857CE3:67471D57:vncproxy:300:root@pam:
Nov 27 15:23:37 pve pvedaemon[92736]: <root@pam> end task UPID:pve:0001724E:02857CE3:67471D57:vncproxy:300:root@pam: OK
Nov 27 15:23:37 pve pvedaemon[94803]: starting vnc proxy UPID:pve:00017253:02857D91:67471D59:vncproxy:280:root@pam:
Nov 27 15:23:37 pve pvedaemon[93327]: <root@pam> starting task UPID:pve:00017253:02857D91:67471D59:vncproxy:280:root@pam:
Nov 27 15:23:40 pve pvedaemon[93327]: <root@pam> end task UPID:pve:00017253:02857D91:67471D59:vncproxy:280:root@pam: OK
Nov 27 15:23:40 pve pvedaemon[94827]: starting vnc proxy UPID:pve:0001726B:02857ED1:67471D5C:vncproxy:250:root@pam:
Nov 27 15:23:40 pve pvedaemon[85885]: <root@pam> starting task UPID:pve:0001726B:02857ED1:67471D5C:vncproxy:250:root@pam:
Nov 27 15:23:42 pve pvedaemon[85885]: <root@pam> end task UPID:pve:0001726B:02857ED1:67471D5C:vncproxy:250:root@pam: OK
Nov 27 15:23:42 pve pvedaemon[85885]: <root@pam> starting task UPID:pve:00017272:02857F9F:67471D5E:vncproxy:231:root@pam:
Nov 27 15:23:42 pve pvedaemon[94834]: starting vnc proxy UPID:pve:00017272:02857F9F:67471D5E:vncproxy:231:root@pam:
Nov 27 15:23:47 pve pvedaemon[85885]: <root@pam> end task UPID:pve:00017272:02857F9F:67471D5E:vncproxy:231:root@pam: OK
Nov 27 15:23:47 pve pvedaemon[94925]: starting vnc proxy UPID:pve:000172CD:02858165:67471D63:vncproxy:220:root@pam:
Nov 27 15:23:47 pve pvedaemon[85885]: <root@pam> starting task UPID:pve:000172CD:02858165:67471D63:vncproxy:220:root@pam:
Nov 27 15:23:49 pve pvedaemon[85885]: <root@pam> end task UPID:pve:000172CD:02858165:67471D63:vncproxy:220:root@pam: OK
Nov 27 15:23:49 pve pvedaemon[92736]: <root@pam> starting task UPID:pve:000172DE:0285820C:67471D65:vncproxy:210:root@pam:
Nov 27 15:23:49 pve pvedaemon[94942]: starting vnc proxy UPID:pve:000172DE:0285820C:67471D65:vncproxy:210:root@pam:
Nov 27 15:23:50 pve pvedaemon[92736]: <root@pam> end task UPID:pve:000172DE:0285820C:67471D65:vncproxy:210:root@pam: OK
Nov 27 15:23:50 pve pvedaemon[94952]: starting vnc proxy UPID:pve:000172E8:028582B6:67471D66:vncproxy:205:root@pam:
Nov 27 15:23:50 pve pvedaemon[92736]: <root@pam> starting task UPID:pve:000172E8:028582B6:67471D66:vncproxy:205:root@pam:
Nov 27 15:23:52 pve pvedaemon[92736]: <root@pam> end task UPID:pve:000172E8:028582B6:67471D66:vncproxy:205:root@pam: OK
Nov 27 15:23:52 pve pvedaemon[93327]: <root@pam> starting task UPID:pve:0001731F:0285835D:67471D68:vncproxy:204:root@pam:
Nov 27 15:23:52 pve pvedaemon[95007]: starting vnc proxy UPID:pve:0001731F:0285835D:67471D68:vncproxy:204:root@pam:
Nov 27 15:23:55 pve pvedaemon[93327]: <root@pam> end task UPID:pve:0001731F:0285835D:67471D68:vncproxy:204:root@pam: OK
Nov 27 15:23:55 pve kernel: pvedaemon worke[93327]: segfault at 6ffc6b4e6000 ip 00006ffc7889c680 sp 00007ffcb9727cc8 error 4 in libc.so.6[6ffc7875f000+155000] likely on CPU 11 (core 11, socket 0)
Nov 27 15:23:55 pve kernel: Code: 75 98 48 81 ea 80 00 00 00 0f 86 eb 00 00 00 48 81 c7 a0 00 00 00 48 01 fa 48 83 e7 80 48 29 fa 62 b1 fd 28 6f c0 0f 1f 40 00 <62> f3 7d 20 3f 0f 00 c5 fd 74 57 20 c5 fd 74 5f 40 c5 fd 74 67 60
Nov 27 15:23:55 pve pvedaemon[4943]: worker 93327 finished
Nov 27 15:23:55 pve pvedaemon[4943]: starting 1 worker(s)
Nov 27 15:23:55 pve pvedaemon[4943]: worker 95036 started
Nov 27 15:25:01 pve CRON[95755]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 27 15:25:01 pve CRON[95756]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 27 15:25:01 pve CRON[95755]: pam_unix(cron:session): session closed for user root
Nov 27 15:27:55 pve pvedaemon[95036]: <root@pam> update VM 201: -delete scsi0
Nov 27 15:28:14 pve pvedaemon[85885]: <root@pam> update VM 201: -virtio0 hddpooli3:vm-201-disk-0,discard=on,iothread=on,aio=threads
Nov 27 15:28:14 pve pvedaemon[85885]: <root@pam> starting task UPID:pve:00017E1D:0285E9BD:67471E6E:qmconfig:201:root@pam:
Nov 27 15:28:14 pve pvedaemon[85885]: <root@pam> end task UPID:pve:00017E1D:0285E9BD:67471E6E:qmconfig:201:root@pam: OK
Nov 27 15:28:18 pve pvedaemon[92736]: <root@pam> starting task UPID:pve:00017E27:0285EB2A:67471E72:qmstart:201:root@pam:
Nov 27 15:28:18 pve pvedaemon[97831]: start VM 201: UPID:pve:00017E27:0285EB2A:67471E72:qmstart:201:root@pam:
Nov 27 15:28:18 pve systemd[1]: Started 201.scope.
Nov 27 15:28:18 pve kernel: tap201i0: entered promiscuous mode
Nov 27 15:28:18 pve ovs-vsctl[97854]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap201i0
Nov 27 15:28:18 pve ovs-vsctl[97854]: ovs|00002|db_ctl_base|ERR|no port named tap201i0
Nov 27 15:28:18 pve ovs-vsctl[97855]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln201i0
Nov 27 15:28:18 pve ovs-vsctl[97855]: ovs|00002|db_ctl_base|ERR|no port named fwln201i0
Nov 27 15:28:18 pve ovs-vsctl[97856]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl -- add-port vmbr1 tap201i0 tag=201 -- set Interface tap201i0 mtu_request=1500
Nov 27 15:28:18 pve pvedaemon[97831]: VM 201 started with PID 97841.
Nov 27 15:28:18 pve pvedaemon[92736]: <root@pam> end task UPID:pve:00017E27:0285EB2A:67471E72:qmstart:201:root@pam: OK
Nov 27 15:28:18 pve pvedaemon[85885]: <root@pam> starting task UPID:pve:00017E50:0285EB69:67471E72:vncproxy:201:root@pam:
Nov 27 15:28:18 pve pvedaemon[97872]: starting vnc proxy UPID:pve:00017E50:0285EB69:67471E72:vncproxy:201:root@pam:
Nov 27 15:28:28 pve pvedaemon[85885]: <root@pam> end task UPID:pve:00017E50:0285EB69:67471E72:vncproxy:201:root@pam: OK
Nov 27 15:28:38 pve pvedaemon[95036]: <root@pam> update VM 201: -boot order=virtio0
Nov 27 15:28:41 pve pvedaemon[98129]: starting vnc proxy UPID:pve:00017F51:0285F42C:67471E89:vncproxy:201:root@pam:
Nov 27 15:28:41 pve pvedaemon[95036]: <root@pam> starting task UPID:pve:00017F51:0285F42C:67471E89:vncproxy:201:root@pam:
Nov 27 15:28:46 pve pvedaemon[98217]: shutdown VM 201: UPID:pve:00017FA9:0285F630:67471E8E:qmshutdown:201:root@pam:
Nov 27 15:28:46 pve pvedaemon[95036]: <root@pam> starting task UPID:pve:00017FA9:0285F630:67471E8E:qmshutdown:201:root@pam:
Nov 27 15:28:50 pve kernel:  zd320: p1 p2 p3
Nov 27 15:28:50 pve lvm[98253]: /dev/zd320p3 excluded: device is rejected by filter config.
Nov 27 15:28:50 pve ovs-vsctl[98260]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln201i0
Nov 27 15:28:50 pve ovs-vsctl[98260]: ovs|00002|db_ctl_base|ERR|no port named fwln201i0
Nov 27 15:28:50 pve ovs-vsctl[98261]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap201i0
Nov 27 15:28:50 pve qmeventd[4162]: read: Connection reset by peer
Nov 27 15:28:50 pve pvedaemon[95036]: <root@pam> end task UPID:pve:00017FA9:0285F630:67471E8E:qmshutdown:201:root@pam: OK
Nov 27 15:28:50 pve pvedaemon[95036]: <root@pam> end task UPID:pve:00017F51:0285F42C:67471E89:vncproxy:201:root@pam: OK
Nov 27 15:28:50 pve systemd[1]: 201.scope: Deactivated successfully.
Nov 27 15:28:50 pve systemd[1]: 201.scope: Consumed 14.789s CPU time.
Nov 27 15:28:51 pve qmeventd[98267]: Starting cleanup for 201
Nov 27 15:28:51 pve qmeventd[98267]: Finished cleanup for 201

What I did was doing a few do-release-upgrades on an mostly offline Ubuntu VM. So there were a lot of scrolling text going on.

Nothing I saw indicated at any time that anything would have gone wrong.

The host is running ECC-memory:

Code:
ras-mc-ctl --errors
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No MCE errors.

-> No ECC corrected or reported errors

As requested by @mira

I did check on a few VMs during the process as seen at 15:23 onwards
 
Last edited:
Thanks for opening a separate thread!

Which kernel are you currently running?
uname -a

Which one did you use previously?
last -n 10 reboot

Please adapt the number so that it shows a reboot before the update as well. Did you install the microcode package for your CPU?
How old is the BIOS, and are there updates available?
 
Last edited:
Code:
root@pve:~# uname -a
Linux pve 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 GNU/Linux

root@pve:~# last -n 10 reboot
reboot   system boot  6.8.12-4-pve     Fri Nov 22 17:53   still running
reboot   system boot  6.8.8-4-pve      Fri Nov  8 22:32 - 17:51 (13+19:18)
reboot   system boot  6.8.8-4-pve      Fri Nov  8 22:21 - 17:51 (13+19:30)
reboot   system boot  6.8.8-4-pve      Wed Sep 25 16:11 - 22:17 (44+07:06)
reboot   system boot  6.8.8-4-pve      Wed Sep 25 15:11 - 22:17 (44+08:05)
reboot   system boot  6.8.8-4-pve      Wed Sep 25 09:52 - 15:04  (05:11)
reboot   system boot  6.8.8-4-pve      Wed Sep 25 09:16 - 15:04  (05:47)
reboot   system boot  6.8.8-4-pve      Mon Sep 23 22:53 - 08:46 (1+09:53)
reboot   system boot  6.8.8-4-pve      Mon Sep 23 19:09 - 22:51  (03:42)
reboot   system boot  6.8.8-4-pve      Wed Aug 21 13:22 - 18:31 (33+05:09)

wtmp begins Tue Jan 17 05:52:25 2023
 
Code:
dmidecode -t bios -q
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 3024
        Release Date: 08/02/2024

Newest BIOS 3035 05/11/2024
Edit: DMI date format differs: Should be 02/08/2024 to be the same dd/mm/yyyy

Microcode is as the distribution (Proxmox VE 8.3) sees fit to distribute. See below:

Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-4-pve)
pve-manager: 8.3.0 (running version: 8.3.0/c1689ccb1065a83b)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.8: 6.8.12-4
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.0
libpve-storage-perl: 8.2.9
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.2.9-1
proxmox-backup-file-restore: 3.2.9-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.3.1
pve-cluster: 8.0.10
pve-container: 5.2.2
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-1
pve-ha-manager: 4.0.6
pve-i18n: 3.3.1
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.0
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
 
Last edited:
Do you have the amd-microcode package installed [0]? dpkg -l amd64-microcode

So far it was just this one instance where you saw the segfault?


[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_firmware_cpu
I have not installed the amd-microcode package at all.

As I did note in the original message, this is the only 'abnormal' thing that has yet occurred to me with PVE 8.3. All smiles. Waiting for the iommu-fix for the 6.11 kernel but nothing to complain about this release at all.

Just thought that as you have all debug info for the executables that you can easily see where there was a malfunction. I see only "??:0" as output for addr2line -e /usr/lib/x86_64-linux-gnu/libc.so.6 6ffc7889c680 as I don't have any debug info.
 
Sorry, I wrote the right package in the command afterwards, but the wrong one (amd-microcode) in the question.
If amd64-microcode is not installed, you may want to consider installing it.

As I did note in the original message, this is the only 'abnormal' thing that has yet occurred to me with PVE 8.3. All smiles. Waiting for the iommu-fix for the 6.11 kernel but nothing to complain about this release at all.
I'd suggest to keep an eye on it for now. If it happens again, we can try going back to an older kernel. And maybe we get a bit more information then.


Just thought that as you have all debug info for the executables that you can easily see where there was a malfunction. I see only "??:0" as output for addr2line -e /usr/lib/x86_64-linux-gnu/libc.so.6 6ffc7889c680 as I don't have any debug info.
The issue here is address randomization [0].



[0] https://en.wikipedia.org/wiki/Address_space_layout_randomization
 
Those could help since they should contain the base address. Do you have one available?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!