Some details
Every few days (at least once a week) the pvestatd service crashes.
I can still log in via the Proxmox GUI (and via ssh), but the containers are all displayed with a "?"
As soon as I restart the pvestadt service (which also works from the GUI), I can see the status of all CT/VMs again.
Most of the CT/VMs are working and running fine, but not all of them.
There is no scheme, sometimes VM X is still running but the services on it are stopped.
Technical Details
- Hardware: Minisforum MS-01
- CPU:
- RAM: 96GB
My Idea
At first I thought it was somehow due to the RAM, as it was minimally overprovisioned and I use ZFS, but then I set the ZFS-MAX-ARC size to 6GB and changed all RAM assignments, so that I currently come out with about 76GB assigned RAM, only for the VMs. Also I have already
Is it perhaps due to the CPU set scheduling in combination with the litte-big CPU architecture?
Every few days (at least once a week) the pvestatd service crashes.
I can still log in via the Proxmox GUI (and via ssh), but the containers are all displayed with a "?"
As soon as I restart the pvestadt service (which also works from the GUI), I can see the status of all CT/VMs again.
Most of the CT/VMs are working and running fine, but not all of them.
There is no scheme, sometimes VM X is still running but the services on it are stopped.
Technical Details
- Hardware: Minisforum MS-01
- CPU:
13th Gen Intel(R) Core(TM) i9-13900H
(from cat /proc/cpuinfo
)- RAM: 96GB
Crucial DDR5 RAM 96GB Kit (2x48GB) 5600MHz SODIMM
proxmox-ve: 8.4.0 (running kernel: 6.14.0-2-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.14.0-2-pve-signed: 6.14.0-2
proxmox-kernel-6.14: 6.14.0-2
proxmox-kernel-6.14.0-1-pve-signed: 6.14.0-1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
frr-pythontools: 10.2.2-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.3
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.14.0-2-pve-signed: 6.14.0-2
proxmox-kernel-6.14: 6.14.0-2
proxmox-kernel-6.14.0-1-pve-signed: 6.14.0-1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
frr-pythontools: 10.2.2-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.3
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
22 April
26 April
journalctl -u pvestatd
Code:
Apr 22 01:00:02 c513 pvestatd[3825]: unable to get PID for CT 253 (not running?)
Apr 22 03:44:35 c513 pvestatd[3825]: status update time (5.898 seconds)
Apr 22 04:45:10 c513 pvestatd[3825]: unable to get PID for CT 108 (not running?)
Apr 22 04:45:40 c513 pvestatd[3825]: modified cpu set for lxc/108: 6,9
Apr 22 04:46:00 c513 pvestatd[3825]: modified cpu set for lxc/108: 9,13
Apr 22 04:46:00 c513 pvestatd[3825]: modified cpu set for lxc/109: 6,14
Apr 22 05:15:21 c513 pvestatd[3825]: modified cpu set for lxc/100: 4-5
Apr 22 05:15:30 c513 pvestatd[3825]: modified cpu set for lxc/109: 8,14
Apr 22 05:15:30 c513 pvestatd[3825]: modified cpu set for lxc/259: 6,15
Apr 22 05:30:21 c513 pvestatd[3825]: modified cpu set for lxc/199: 7,18
Apr 22 05:30:21 c513 pvestatd[3825]: modified cpu set for lxc/253: 9,19
Apr 22 05:30:42 c513 pvestatd[3825]: modified cpu set for lxc/259: 1,15
Apr 22 05:30:42 c513 pvestatd[3825]: modified cpu set for lxc/301: 2,6
Apr 22 05:30:50 c513 pvestatd[3825]: unable to get PID for CT 303 (not running?)
Apr 22 05:32:10 c513 pvestatd[3825]: modified cpu set for lxc/108: 0,13
Apr 22 05:32:10 c513 pvestatd[3825]: modified cpu set for lxc/253: 1,9
Apr 22 05:32:31 c513 pvestatd[3825]: modified cpu set for lxc/108: 10,13
Apr 22 05:32:31 c513 pvestatd[3825]: modified cpu set for lxc/301: 2,11
Apr 22 05:33:00 c513 pvestatd[3825]: modified cpu set for lxc/198: 16-17
Apr 22 05:33:00 c513 pvestatd[3825]: modified cpu set for lxc/300: 17,19
Apr 22 05:33:50 c513 pvestatd[3825]: modified cpu set for lxc/300: 8,17
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Consumed 57min 30.326s CPU time.
Apr 22 06:50:10 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 22 06:50:10 c513 pvestatd[1218420]: starting server
Apr 22 06:50:10 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.
cat /var/log/syslog
Code:
2025-04-22T05:37:40.222420+02:00 c513 kernel: [32210.491667] pvestatd[3825]: segfault at 32 ip 00005e499fa82232 sp 00007fff63bc9b00 error 4 in perl[ff232,5e499f9cc000+195000] likely on CPU 6 (core 12, socket 0)
2025-04-22T05:37:40.226321+02:00 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
26 April
journalctl -u pvestatd
Code:
Apr 26 05:32:27 c513 pvestatd[2555394]: modified cpu set for lxc/109: 8,18
Apr 26 05:33:07 c513 pvestatd[2555394]: modified cpu set for lxc/100: 15-16
Apr 26 05:54:18 c513 pvestatd[2555394]: auth key pair too old, rotating..
Apr 26 06:24:47 c513 pvestatd[2555394]: Argument "2555394:1658635" isn't numeric in int at /usr/share/perl5/PVE/QMPClient.pm line 273.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Consumed 9h 24min 56.262s CPU time.
Apr 28 07:06:47 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 28 07:06:48 c513 pvestatd[1083338]: starting server
Apr 28 07:06:48 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.
cat /var/log/syslog
Code:
Apr 26 06:34:27 c513 kernel: pvestatd[2555394]: segfault at ffffffffffffffff ip 0000653ef51344dc sp 00007ffeae4bab10 error 7 in perl[1344dc,653ef5049000+195000] likely on CPU 6 (core 12, socket 0)
Apr 26 06:34:27 c513 kernel: Code: 8b 43 0c e9 6a ff ff ff 66 0f 1f 44 00 00 3c 02 0f 86 a0 00 00 00 0d 00 00 00 10 48 8b 55 10 89 45 0c 48 8b 45 00 48 8b 40 18 <c6> 44 02 ff 00 48 8b 45 00 48 8b 75 10 48 8b 40 18 e9 73 ff ff ff
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Consumed 9h 24min 56.262s CPU time.
My Idea
At first I thought it was somehow due to the RAM, as it was minimally overprovisioned and I use ZFS, but then I set the ZFS-MAX-ARC size to 6GB and changed all RAM assignments, so that I currently come out with about 76GB assigned RAM, only for the VMs. Also I have already
kmstuned
deactivated.Is it perhaps due to the CPU set scheduling in combination with the litte-big CPU architecture?