pvestatd crashes every few days

xenon96 · Apr 28, 2025

Some details

Every few days (at least once a week) the pvestatd service crashes.
I can still log in via the Proxmox GUI (and via ssh), but the containers are all displayed with a "?"
As soon as I restart the pvestadt service (which also works from the GUI), I can see the status of all CT/VMs again.
Most of the CT/VMs are working and running fine, but not all of them.
There is no scheme, sometimes VM X is still running but the services on it are stopped.

Technical Details

- Hardware: Minisforum MS-01
- CPU: 13th Gen Intel(R) Core(TM) i9-13900H (from cat /proc/cpuinfo)
- RAM: 96GB Crucial DDR5 RAM 96GB Kit (2x48GB) 5600MHz SODIMM

proxmox-ve: 8.4.0 (running kernel: 6.14.0-2-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.14.0-2-pve-signed: 6.14.0-2
proxmox-kernel-6.14: 6.14.0-2
proxmox-kernel-6.14.0-1-pve-signed: 6.14.0-1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
frr-pythontools: 10.2.2-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.3
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2

22 April

journalctl -u pvestatd

Code:

Apr 22 01:00:02 c513 pvestatd[3825]: unable to get PID for CT 253 (not running?)
Apr 22 03:44:35 c513 pvestatd[3825]: status update time (5.898 seconds)
Apr 22 04:45:10 c513 pvestatd[3825]: unable to get PID for CT 108 (not running?)
Apr 22 04:45:40 c513 pvestatd[3825]: modified cpu set for lxc/108: 6,9
Apr 22 04:46:00 c513 pvestatd[3825]: modified cpu set for lxc/108: 9,13
Apr 22 04:46:00 c513 pvestatd[3825]: modified cpu set for lxc/109: 6,14
Apr 22 05:15:21 c513 pvestatd[3825]: modified cpu set for lxc/100: 4-5
Apr 22 05:15:30 c513 pvestatd[3825]: modified cpu set for lxc/109: 8,14
Apr 22 05:15:30 c513 pvestatd[3825]: modified cpu set for lxc/259: 6,15
Apr 22 05:30:21 c513 pvestatd[3825]: modified cpu set for lxc/199: 7,18
Apr 22 05:30:21 c513 pvestatd[3825]: modified cpu set for lxc/253: 9,19
Apr 22 05:30:42 c513 pvestatd[3825]: modified cpu set for lxc/259: 1,15
Apr 22 05:30:42 c513 pvestatd[3825]: modified cpu set for lxc/301: 2,6
Apr 22 05:30:50 c513 pvestatd[3825]: unable to get PID for CT 303 (not running?)
Apr 22 05:32:10 c513 pvestatd[3825]: modified cpu set for lxc/108: 0,13
Apr 22 05:32:10 c513 pvestatd[3825]: modified cpu set for lxc/253: 1,9
Apr 22 05:32:31 c513 pvestatd[3825]: modified cpu set for lxc/108: 10,13
Apr 22 05:32:31 c513 pvestatd[3825]: modified cpu set for lxc/301: 2,11
Apr 22 05:33:00 c513 pvestatd[3825]: modified cpu set for lxc/198: 16-17
Apr 22 05:33:00 c513 pvestatd[3825]: modified cpu set for lxc/300: 17,19
Apr 22 05:33:50 c513 pvestatd[3825]: modified cpu set for lxc/300: 8,17
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Consumed 57min 30.326s CPU time.
Apr 22 06:50:10 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 22 06:50:10 c513 pvestatd[1218420]: starting server
Apr 22 06:50:10 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.

cat /var/log/syslog

Code:

2025-04-22T05:37:40.222420+02:00 c513 kernel: [32210.491667] pvestatd[3825]: segfault at 32 ip 00005e499fa82232 sp 00007fff63bc9b00 error 4 in perl[ff232,5e499f9cc000+195000] likely on CPU 6 (core 12, socket 0)
2025-04-22T05:37:40.226321+02:00 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV

26 April

journalctl -u pvestatd

Code:

Apr 26 05:32:27 c513 pvestatd[2555394]: modified cpu set for lxc/109: 8,18
Apr 26 05:33:07 c513 pvestatd[2555394]: modified cpu set for lxc/100: 15-16
Apr 26 05:54:18 c513 pvestatd[2555394]: auth key pair too old, rotating..
Apr 26 06:24:47 c513 pvestatd[2555394]: Argument "2555394:1658635" isn't numeric in int at /usr/share/perl5/PVE/QMPClient.pm line 273.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Consumed 9h 24min 56.262s CPU time.
Apr 28 07:06:47 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 28 07:06:48 c513 pvestatd[1083338]: starting server
Apr 28 07:06:48 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.

cat /var/log/syslog

Code:

Apr 26 06:34:27 c513 kernel: pvestatd[2555394]: segfault at ffffffffffffffff ip 0000653ef51344dc sp 00007ffeae4bab10 error 7 in perl[1344dc,653ef5049000+195000] likely on CPU 6 (core 12, socket 0)
Apr 26 06:34:27 c513 kernel: Code: 8b 43 0c e9 6a ff ff ff 66 0f 1f 44 00 00 3c 02 0f 86 a0 00 00 00 0d 00 00 00 10 48 8b 55 10 89 45 0c 48 8b 45 00 48 8b 40 18 <c6> 44 02 ff 00 48 8b 45 00 48 8b 75 10 48 8b 40 18 e9 73 ff ff ff
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Consumed 9h 24min 56.262s CPU time.

My Idea

At first I thought it was somehow due to the RAM, as it was minimally overprovisioned and I use ZFS, but then I set the ZFS-MAX-ARC size to 6GB and changed all RAM assignments, so that I currently come out with about 76GB assigned RAM, only for the VMs. Also I have already kmstuned deactivated.

Is it perhaps due to the CPU set scheduling in combination with the litte-big CPU architecture?

xenon96 · Apr 29, 2025

pvestatd is failed again

Apr 29 04:02:46 c513 pvestatd[3875]: status update time (5.119 seconds)
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/100: 8,10
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,14-15,18
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,9,14-15
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/108: 18-19
Apr 29 05:15:11 c513 pvestatd[3875]: modified cpu set for lxc/109: 8,10
Apr 29 05:16:19 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - got timeout
Apr 29 05:16:20 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,18
Apr 29 05:16:21 c513 pvestatd[3875]: status update time (9.462 seconds)
Apr 29 05:16:29 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - unable to connect to VM>
Apr 29 05:16:30 c513 pvestatd[3875]: status update time (9.634 seconds)
Apr 29 05:16:40 c513 pvestatd[3875]: status update time (8.436 seconds)
Apr 29 05:30:11 c513 pvestatd[3875]: modified cpu set for lxc/100: 18-19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/254: 1-2
Apr 29 05:30:22 c513 pvestatd[3875]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/LXC.pm line 305.
Apr 29 05:30:22 c513 pvestatd[3875]: lxc console cleanup error: failed to read from command socket: Connection reset by peer
Apr 29 05:30:32 c513 pvestatd[3875]: modified cpu set for lxc/100: 16,19
Apr 29 05:30:41 c513 pvestatd[3875]: modified cpu set for lxc/300: 0,18
Apr 29 05:30:51 c513 pvestatd[3875]: modified cpu set for lxc/108: 3,19
Apr 29 05:32:11 c513 pvestatd[3875]: modified cpu set for lxc/189: 5-6,10-11
Apr 29 05:32:22 c513 pvestatd[3875]: modified cpu set for lxc/300: 2,18
Apr 29 05:32:23 c513 pvestatd[3875]: unable to get PID for CT 309 (not running?)
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/102: 0,11
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/108: 1,19
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Consumed 2h 23min 7.263s CPU time.

xenon96 · Apr 30, 2025

pvestatd is failed again.

Please tell me if I should debug specific logs / services. Unless I'll keep posting error messages.

Apr 29 19:19:36 c513 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
Apr 29 19:19:36 c513 kernel: #PF: supervisor read access in kernel mode
Apr 29 19:19:36 c513 kernel: #PF: error_code(0x0000) - not-present page
Apr 29 19:19:36 c513 kernel: PGD 0 P4D 0
Apr 29 19:19:36 c513 kernel: Oops: Oops: 0000 [#5] PREEMPT SMP NOPTI
Apr 29 19:19:36 c513 kernel: CPU: 6 UID: 0 PID: 89922 Comm: pvestatd Tainted: P UD W OE 6.14.0-2-pve #1
Apr 29 19:19:36 c513 kernel: Tainted: [P]=PROPRIETARY_MODULE, =USER, [D]=DIE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Apr 29 19:19:36 c513 kernel: Hardware name: Micro Computer (HK) Tech Limited Venus Series/AHWSA, BIOS AHWSA.1.22 03/12/2024
Apr 29 19:19:36 c513 kernel: RIP: 0010:free_pages_and_swap_cache+0x2a/0x1c0
Apr 29 19:19:36 c513 kernel: Code: 0f 1f 44 00 00 55 b9 20 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 48 8d bd d0 fe ff ff 53 48 81 ec 88 01 00 00 65 48 <8b> 04 25 28 00 00 00 48 89 45 d0 31 c0 f3 48 ab 48 8d bd 58 fe ff
Apr 29 19:19:36 c513 kernel: RSP: 0018:ffffbe8b81377810 EFLAGS: 00010246
Apr 29 19:19:36 c513 kernel: RAX: 0000000000000000 RBX: 00000000000001fd RCX: ffffef8e49a3be08
Apr 29 19:19:36 c513 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: ffffbe8b813779c0 R08: 0000000000000000 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a953bfd8010
Apr 29 19:19:36 c513 kernel: R13: ffffef8e72e6d200 R14: ffff9a953bfd86b8 R15: 00000000000000d5
Apr 29 19:19:36 c513 kernel: FS: 0000000000000000(0000) GS:ffff9a97af300000(0000) knlGS:0000000000000000
Apr 29 19:19:36 c513 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 29 19:19:36 c513 kernel: CR2: 0000000000000028 CR3: 000000147a282005 CR4: 0000000000f72ef0
Apr 29 19:19:36 c513 kernel: PKRU: 55555554
Apr 29 19:19:36 c513 kernel: Call Trace:
Apr 29 19:19:36 c513 kernel: <TASK>
Apr 29 19:19:36 c513 kernel: ? show_regs+0x6c/0x80
Apr 29 19:19:36 c513 kernel: ? __die+0x24/0x80
Apr 29 19:19:36 c513 kernel: ? page_fault_oops+0x175/0x5e0
Apr 29 19:19:36 c513 kernel: ? do_user_addr_fault+0x4a5/0x830
Apr 29 19:19:36 c513 kernel: ? __count_memcg_events+0xc0/0x160
Apr 29 19:19:36 c513 kernel: ? exc_page_fault+0x85/0x1e0
Apr 29 19:19:36 c513 kernel: ? asm_exc_page_fault+0x27/0x30
Apr 29 19:19:36 c513 kernel: ? free_pages_and_swap_cache+0x2a/0x1c0
Apr 29 19:19:36 c513 kernel: __tlb_batch_free_encoded_pages+0x45/0xb0
Apr 29 19:19:36 c513 kernel: tlb_flush_mmu+0x52/0x150
Apr 29 19:19:36 c513 kernel: unmap_page_range+0xc1d/0x1ab0
Apr 29 19:19:36 c513 kernel: unmap_single_vma+0x89/0xf0
Apr 29 19:19:36 c513 kernel: unmap_vmas+0xb5/0x190
Apr 29 19:19:36 c513 kernel: exit_mmap+0xfa/0x3f0
Apr 29 19:19:36 c513 kernel: mmput+0x69/0x130
Apr 29 19:19:36 c513 kernel: do_exit+0x2c9/0xab0
Apr 29 19:19:36 c513 kernel: do_group_exit+0x34/0x90
Apr 29 19:19:36 c513 kernel: __x64_sys_exit_group+0x18/0x20
Apr 29 19:19:36 c513 kernel: x64_sys_call+0xf7a/0x2540
Apr 29 19:19:36 c513 kernel: do_syscall_64+0x7e/0x170
Apr 29 19:19:36 c513 kernel: ? __count_memcg_events+0xc0/0x160
Apr 29 19:19:36 c513 kernel: ? count_memcg_events.constprop.0+0x2a/0x50
Apr 29 19:19:36 c513 kernel: ? handle_mm_fault+0xae/0x360
Apr 29 19:19:36 c513 kernel: ? do_user_addr_fault+0x5ec/0x830
Apr 29 19:19:36 c513 kernel: ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
Apr 29 19:19:36 c513 kernel: ? irqentry_exit_to_user_mode+0x2d/0x1d0
Apr 29 19:19:36 c513 kernel: ? irqentry_exit+0x43/0x50
Apr 29 19:19:36 c513 kernel: ? exc_page_fault+0x96/0x1e0
Apr 29 19:19:36 c513 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 29 19:19:36 c513 kernel: RIP: 0033:0x7b77659e9409
Apr 29 19:19:36 c513 kernel: Code: Unable to access opcode bytes at 0x7b77659e93df.
Apr 29 19:19:36 c513 kernel: RSP: 002b:00007ffc6064eb78 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Apr 29 19:19:36 c513 kernel: RAX: ffffffffffffffda RBX: 0000603f92b5f2a0 RCX: 00007b77659e9409
Apr 29 19:19:36 c513 kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: 0000000000000001 R08: ffffffffffffff78 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 00007b7765927200 R11: 0000000000000206 R12: 000000000000001d
Apr 29 19:19:36 c513 kernel: R13: 0000000000000000 R14: 0000603f98ce8ba0 R15: 0000603f930ba208
Apr 29 19:19:36 c513 kernel: </TASK>
Apr 29 19:19:36 c513 kernel: Modules linked in: tcp_diag inet_diag xt_mark vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd nf_conntrack_netlink xt_nat xt_tcpudp nfsv3 nfs_acl nfs lockd grace xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay ccm cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter ipmi_devintf ipmi_msghandler scsi_transport_iscsi nf_tables nvme_fabrics nvme_keyring softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink
Apr 29 19:19:36 c513 kernel: snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec_hdmi snd_soc_core snd_compress mt7921e ac97_bus snd_pcm_dmaengine mt7921_common mt792x_lib snd_hda_intel mt76_connac_lib kvm_intel snd_intel_dspcfg mt76 i915(OE) snd_intel_sdw_acpi nouveau kvm mxm_wmi polyval_clmulni snd_hda_codec drm_gpuvm polyval_generic mac80211 btusb ghash_clmulni_intel snd_hda_core gpu_sched drm_buddy sha256_ssse3 btrtl sha1_ssse3 drm_ttm_helper snd_hwdep btintel ttm aesni_intel btbcm snd_pcm btmtk drm_suballoc_helper drm_exec crypto_simd cmdlinepart drm_display_helper snd_timer cfg80211 bluetooth cryptd spi_nor cdc_acm cec rapl mei_me spd5118 snd wmi_bmof intel_cstate libarc4 rc_core mtd pcspkr soundcore mei i2c_algo_bit igen6_edac intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid vhost_net vhost
Apr 29 19:19:36 c513 kernel: vhost_iotlb tap nct6775 nct6775_core hwmon_vid coretemp efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq uas usb_storage nvme xhci_pci spi_intel_pci thunderbolt i2c_i801 i40e igc nvme_core spi_intel i2c_smbus xhci_hcd i2c_mux nvme_auth libie video pinctrl_tigerlake wmi
Apr 29 19:19:36 c513 kernel: CR2: 0000000000000028
Apr 29 19:19:36 c513 kernel: ---[ end trace 0000000000000000 ]---
Apr 29 19:19:36 c513 kernel: RIP: 0010:syscall_exit_to_user_mode+0x39/0x1d0
Apr 29 19:19:36 c513 kernel: Code: 06 ff fa 0f 1f 44 00 00 0f 1f 44 00 00 65 4c 8b 2d 9c 28 43 71 49 8b 5d 00 f7 c3 1e 30 02 00 75 5a 48 89 df e8 58 b1 06 ff 0f <1f> 44 00 00 31 c0 b9 01 00 00 00 89 c2 66 90 5b 41 5c 41 5d 5d 31
Apr 29 19:19:36 c513 kernel: RSP: 0018:ffffbe8b8f8a3bd0 EFLAGS: 00010046
Apr 29 19:19:36 c513 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Apr 29 19:19:36 c513 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: ffffbe8b8f8a3be8 R08: 0000000000000000 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbe8b8f8a3f48
Apr 29 19:19:36 c513 kernel: R13: ffff9a8032c0aa40 R14: 0000000000000000 R15: 0000000000000000
Apr 29 19:19:36 c513 kernel: FS: 0000000000000000(0000) GS:ffff9a97af300000(0000) knlGS:0000000000000000
Apr 29 19:19:36 c513 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 29 19:19:36 c513 kernel: CR2: 0000000000000028 CR3: 000000147a282005 CR4: 0000000000f72ef0
Apr 29 19:19:36 c513 kernel: PKRU: 55555554
Apr 29 19:19:36 c513 kernel: note: pvestatd[89922] exited with irqs disabled
Apr 29 19:19:36 c513 kernel: Fixing recursive fault but reboot is needed!
Apr 29 19:19:36 c513 kernel: BUG: scheduling while atomic: pvestatd/89922/0x00000000
Apr 29 19:19:36 c513 kernel: Modules linked in: tcp_diag inet_diag xt_mark vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd nf_conntrack_netlink xt_nat xt_tcpudp nfsv3 nfs_acl nfs lockd grace xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay ccm cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter ipmi_devintf ipmi_msghandler scsi_transport_iscsi nf_tables nvme_fabrics nvme_keyring softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink
Apr 29 19:19:36 c513 kernel: snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec_hdmi snd_soc_core snd_compress mt7921e ac97_bus snd_pcm_dmaengine mt7921_common mt792x_lib snd_hda_intel mt76_connac_lib kvm_intel snd_intel_dspcfg mt76 i915(OE) snd_intel_sdw_acpi nouveau kvm mxm_wmi polyval_clmulni snd_hda_codec drm_gpuvm polyval_generic mac80211 btusb ghash_clmulni_intel snd_hda_core gpu_sched drm_buddy sha256_ssse3 btrtl sha1_ssse3 drm_ttm_helper snd_hwdep btintel ttm aesni_intel btbcm snd_pcm btmtk drm_suballoc_helper drm_exec crypto_simd cmdlinepart drm_display_helper snd_timer cfg80211 bluetooth cryptd spi_nor cdc_acm cec rapl mei_me spd5118 snd wmi_bmof intel_cstate libarc4 rc_core mtd pcspkr soundcore mei i2c_algo_bit igen6_edac intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid vhost_net vhost
Apr 29 19:19:36 c513 kernel: vhost_iotlb tap nct6775 nct6775_core hwmon_vid coretemp efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq uas usb_storage nvme xhci_pci spi_intel_pci thunderbolt i2c_i801 i40e igc nvme_core spi_intel i2c_smbus xhci_hcd i2c_mux nvme_auth libie video pinctrl_tigerlake wmi
Apr 29 19:19:36 c513 kernel: CPU: 6 UID: 0 PID: 89922 Comm: pvestatd Tainted: P UD W OE 6.14.0-2-pve #1
Apr 29 19:19:36 c513 kernel: Tainted: [P]=PROPRIETARY_MODULE, =USER, [D]=DIE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Apr 29 19:19:36 c513 kernel: Hardware name: Micro Computer (HK) Tech Limited Venus Series/AHWSA, BIOS AHWSA.1.22 03/12/2024
Apr 29 19:19:36 c513 kernel: Call Trace:
Apr 29 19:19:36 c513 kernel: <TASK>
Apr 29 19:19:36 c513 kernel: dump_stack_lvl+0x76/0xa0
Apr 29 19:19:36 c513 kernel: dump_stack+0x10/0x20
Apr 29 19:19:36 c513 kernel: __schedule_bug+0x64/0x80
Apr 29 19:19:36 c513 kernel: __schedule+0x1058/0x13f0
Apr 29 19:19:36 c513 kernel: ? vprintk+0x18/0x50
Apr 29 19:19:36 c513 kernel: ? _printk+0x60/0x90
Apr 29 19:19:36 c513 kernel: do_task_dead+0x43/0x50
Apr 29 19:19:36 c513 kernel: make_task_dead+0x142/0x160
Apr 29 19:19:36 c513 kernel: rewind_stack_and_make_dead+0x16/0x20
Apr 29 19:19:36 c513 kernel: RIP: 0033:0x7b77659e9409
Apr 29 19:19:36 c513 kernel: Code: Unable to access opcode bytes at 0x7b77659e93df.
Apr 29 19:19:36 c513 kernel: RSP: 002b:00007ffc6064eb78 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Apr 29 19:19:36 c513 kernel: RAX: ffffffffffffffda RBX: 0000603f92b5f2a0 RCX: 00007b77659e9409
Apr 29 19:19:36 c513 kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: 0000000000000001 R08: ffffffffffffff78 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 00007b7765927200 R11: 0000000000000206 R12: 000000000000001d
Apr 29 19:19:36 c513 kernel: R13: 0000000000000000 R14: 0000603f98ce8ba0 R15: 0000603f930ba208
Apr 29 19:19:36 c513 kernel: </TASK>
Apr 29 19:19:37 c513 pvestatd[2994440]: got timeout

Apr 29 19:32:55 c513 kernel: pvestatd[2994440]: segfault at 12 ip 0000603f7e9cb232 sp 00007ffc6064eb40 error 4 in perl[ff232,603f7e915000+195000] likely on CPU 6 (core 12, socket 0)
Apr 29 19:32:55 c513 kernel: Code: 20 49 03 41 28 81 60 0c ff ff 1f ff 0f 1f 40 00 45 84 f6 74 73 4c 89 e3 4d 39 fc 0f 84 bf 00 00 00 48 85 db 0f 84 91 00 00 00 <48> 8b 6b 08 4c 8b 23 f6 43 13 04 0f 84 5d ff ff ff 48 8d 05 56 ea
Apr 29 19:32:55 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:35:55 c513 systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: State 'final-sigterm' timed out. Killing.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Processes still around after final SIGKILL. Entering failed mode.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Unit process 89922 (pvestatd) remains running after unit stopped.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Consumed 1h 18min 42.480s CPU time.

xenon96 · Apr 30, 2025

xenon96 said:
pvestatd is failed again.

Please tell me if I should debug specific logs / services. Unless I'll keep posting error messages.

Before this logs, there are some messages which are interessing (may warnigns or errors?). This messages are before the above one (see timestmaps)

Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/253: 3,16
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/254: 2,4
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/259: 17-18
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/300: 3,19
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/304: 4-5
Apr 29 04:02:46 c513 pvestatd[3875]: status update time (5.119 seconds)
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/100: 8,10
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,14-15,18
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,9,14-15
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/108: 18-19
Apr 29 05:15:11 c513 pvestatd[3875]: modified cpu set for lxc/109: 8,10
Apr 29 05:16:19 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - got timeout
Apr 29 05:16:20 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,18
Apr 29 05:16:21 c513 pvestatd[3875]: status update time (9.462 seconds)
Apr 29 05:16:29 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - unable to conne>
Apr 29 05:16:30 c513 pvestatd[3875]: status update time (9.634 seconds)
Apr 29 05:16:40 c513 pvestatd[3875]: status update time (8.436 seconds)
Apr 29 05:30:11 c513 pvestatd[3875]: modified cpu set for lxc/100: 18-19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/254: 1-2
Apr 29 05:30:22 c513 pvestatd[3875]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/LXC.pm line 305.
Apr 29 05:30:22 c513 pvestatd[3875]: lxc console cleanup error: failed to read from command socket: Connection reset by peer
Apr 29 05:30:32 c513 pvestatd[3875]: modified cpu set for lxc/100: 16,19
Apr 29 05:30:41 c513 pvestatd[3875]: modified cpu set for lxc/300: 0,18
Apr 29 05:30:51 c513 pvestatd[3875]: modified cpu set for lxc/108: 3,19
Apr 29 05:32:11 c513 pvestatd[3875]: modified cpu set for lxc/189: 5-6,10-11
Apr 29 05:32:22 c513 pvestatd[3875]: modified cpu set for lxc/300: 2,18
Apr 29 05:32:23 c513 pvestatd[3875]: unable to get PID for CT 309 (not running?)
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/102: 0,11
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/108: 1,19
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Consumed 2h 23min 7.263s CPU time.
Apr 29 07:59:22 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 29 07:59:23 c513 pvestatd[2994440]: starting server
Apr 29 07:59:23 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value $fh in <HANDLE> at /usr/share/perl5/PVE/Tools.pm line 357.
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 113 qmp command failed - VM 113 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 283 qmp command failed - VM 283 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 601 qmp command failed - VM 601 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 112 qmp command failed - VM 112 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 281 qmp command failed - VM 281 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 114 qmp command failed - VM 114 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 285 qmp command failed - VM 285 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/Service/pvestatd.pm li>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/Service/pvestatd.pm>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Service/pv>
Apr 29 12:52:56 c513 pvestatd[2994440]: pbs: error fetching datastores - 401 Unauthorized
Apr 29 12:55:53 c513 pvestatd[2994440]: modified cpu set for lxc/109: 1,19
Apr 29 12:56:04 c513 pvestatd[2994440]: modified cpu set for lxc/198: 1,13
Apr 29 12:56:04 c513 pvestatd[2994440]: modified cpu set for lxc/259: 18-19
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value $fh in <HANDLE> at /usr/share/perl5/PVE/ProcFSTools.pm line 171, >
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:38 c513 pvestatd[2994440]: pbs: error fetching datastores - 401 Unauthorized
Apr 29 19:19:37 c513 pvestatd[2994440]: got timeout
Apr 29 19:32:55 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:35:55 c513 systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: State 'final-sigterm' timed out. Killing.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Processes still around after final SIGKILL. Entering failed mode.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Unit process 89922 (pvestatd) remains running after unit stopped.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Consumed 1h 18min 42.480s CPU time.
-- Boot 36aea0ed5d9740a498b064e581186788 --
Apr 29 21:39:50 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 29 21:39:51 c513 pvestatd[3885]: starting server
Apr 29 21:39:51 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.

promo2100 · May 2, 2025

I have two MS-01 servers in my cluster as well, along with two HP server. Mine are also the i9-13900H, one of which is running 96 GB of ram and the other is running 64 GB. This started after the Proxmox 8.4 update a few weeks ago... prior to that, I never had issues.

My MS-01 servers are running the latest firmware - prior to that firmware upgrade last year, these servers were randomly locking up, but that has never happened since.

So the pvestatd service failure has happened about a dozen times now on one of my MS-01 servers that has more VMs on it, while the MS-01 that is doing almost nothing has only had the pvestatd fail just once. The HP servers, one of which is hosting an absolute crap-ton of vm's and lxc's, is chugging along without having any issue.

So my wild assumption that I'll throw out there with absolutely no evidence to back it up, is that the 8.4 Proxmox software upgrade has something that is either disagreeing with these MS-01 servers, or possibly these Intel chipsets/cpus. I personally haven't had a lot of time to dig into this further, I just restart the service and move on.

fiona · May 2, 2025

Hi,
@xenon96, this generation of Intel chipsets is known to have issues, ensure you have the latest CPU microcode/BIOS updates installed, e.g. https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_firmware_cpu
I'd also run a memtest86+ to check the RAM when you have the chance. Does booting kernel 6.8 or 6.11 help?

@promo2100 could you share more details about your failures? Do you still have the current microcode? Does booting the previous kernel version help?

xenon96 · May 5, 2025

I have installed the latest micocode:

Code:

# grep microcode /proc/cpuinfo | uniq
microcode       : 0x4124

Does booting kernel 6.8 or 6.11 help?

I'll try this the next days. But yes, after the update to 6.14 this became much more frequent. Before I also had the problem, but rather once a week / a few times a month. But it didn't bother me that much then, and I can't say whether the problem was the same at all. I will continue to report.

xenon96 · May 6, 2025

@fiona kernel 6.11 does not help (last 2 days several deads of pvestatd). Downgrading now back to 6.8 ...

xenon96 · May 7, 2025

With 6.8 Kernel the system is unreachable after a while. I think its a kernel-panic and the whole system goes down while with >=6.11 only the pvestatd crashes.

Thats the log before the reboot, but I don't think its very meaningful.

So I have no solution for this yet. I'll go back to 6.14 and keep the service restarting, as many ct/vms keep still running ...

Maybe its a solution to turn the E-Cores off, but I haven't tried that yet.

Code:

May 07 02:37:15 c513 systemd[1]: frr.service: Watchdog timeout (limit 1min)!
May 07 02:37:15 c513 systemd[1]: frr.service: Killing process 1807 (watchfrr) with signal SIGABRT.
May 07 02:37:15 c513 WATCHFRR[1807]: Received signal 6 at 1746578235 (si_addr 0x1, PC 0x74ad4ced02b8); aborting...
May 07 02:37:15 c513 systemd[1]: frr.service: Killing process 1875 (zebra) with signal SIGABRT.
May 07 02:37:15 c513 ZEBRA[1875]: Received signal 6 at 1746578235 (si_addr 0x1, PC 0x793fb0a95316); aborting...
May 07 02:37:15 c513 systemd[1]: frr.service: Killing process 2056 (mgmtd) with signal SIGABRT.
May 07 02:37:15 c513 MGMTD[2056]: Received signal 6 at 1746578235 (si_addr 0x1, PC 0x796789a5a2b8); aborting...
-- Reboot --
May 07 06:25:36 c513 kernel: Linux version 6.8.12-10-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) ()
May 07 06:25:36 c513 kernel: Command line: initrd=\EFI\proxmox\6.8.12-10-pve\initrd.img-6.8.12-10-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 module_blacklist=xe video=efifb:eek:off
M

fiona · May 7, 2025

xenon96 said:
With 6.8 Kernel the system is unreachable after a while. I think its a kernel-panic and the whole system goes down while with >=6.11 only the pvestatd crashes.

Then we can likely rule out that it's caused by a kernel regression.

xenon96 said:
Thats the log before the reboot, but I don't think its very meaningful.

It shows different failures. From what we know so far, it's likely something low-level firmware/hardware-related.

xenon96 said:
So I have no solution for this yet. I'll go back to 6.14 and keep the service restarting, as many ct/vms keep still running ...

Maybe its a solution to turn the E-Cores off, but I haven't tried that yet.

Certainly worth a try. I'd also run a memtest.

xenon96 · May 7, 2025

Certainly worth a try. I'd also run a memtest.

I'll do it, next week. I'm currently remodeling the office and can't connect a montior and don't have KVM configured.

@promo2100 @fiona I maybe found a "stable" workaround. Currently everything is running fine since ~15h.
I have set cpuaffinity on all my QEMU-Machines. Every machine now only has P-Cores (0-13) or E-Cores (14-20) assigned (on 13th Gen Intel(R) Core(TM) i9-13900H).

So e.g.

QEMU-101

Code:

cpu: host
sockets: 1
cores: 2
affinity: 14-20
balloon: 0
memory: 2048

QEMU-102

Code:

cpu: host
sockets: 1
cores: 6
affinity: 0-13
balloon: 0
memory: 2048

So the pvestadt CPU scheduler can contiunue work, but not assign different cores on the fly.

xenon96 · May 8, 2025

Everything is still up and running. I tihnk thats a working solution

xenon96 · May 24, 2025

So pvestatd crashes again. It seems there is another problem, or the original one is not solved.
First, I'll run a memtest next week.

But @fiona, why there is no Restart=always in the pvestatd service?

I manually addad this line, as long with RestartSec=60 (to [Service])

Code:

[Unit]
Description=PVE Status Daemon
ConditionPathExists=/usr/bin/pvestatd
Wants=pve-cluster.service
After=pve-cluster.service

# X-CUSTOM
StartLimitIntervalSec=400
StartLimitBurst=3

[Service]
ExecStart=/usr/bin/pvestatd start
ExecStop=/usr/bin/pvestatd stop
ExecReload=/usr/bin/pvestatd restart
PIDFile=/run/pvestatd.pid
Type=forking

# X-Custom
Restart=always
RestartSec=60

[Install]
WantedBy=multi-user.target

fiona · May 26, 2025

xenon96 said:
But @fiona, why there is no "Restart=always" in the pvestatd service?

Sent a patch to add Restart=on-failure for discussion: https://lore.proxmox.com/pve-devel/20250526084535.12539-1-f.ebner@proxmox.com/

alexdelprete · Sep 14, 2025

fiona said:
Sent a patch to add Restart=on-failure for discussion: https://lore.proxmox.com/pve-devel/20250526084535.12539-1-f.ebner@proxmox.com/

Has this patch been merged? I have latest upgrades but I still don't see this in the service unit file. I'll add it manually.

fiona · Sep 15, 2025

Hi,

alexdelprete said:
Has this patch been merged? I have latest upgrades but I still don't see this in the service unit file. I'll add it manually.

no, it was not merged yet.

Ansekh · Sep 22, 2025

Hello,

I have exactly the same problem since upgrading to 8.4+.
I also have an MS-01 with an i9-13900H and 64 GB of RAM.

Before 8.4 I had no issues for several months, but now I can’t go 24 hours without having to restart the service, I’ve tried everything.

root@pve:~# systemctl status pvestatd
× pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
Active: failed (Result: core-dump) since Sun 2025-09-21 13:45:39 CEST; 22h ago
Duration: 1d 21h 4min 47.319s
Process: 1270 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 1279 (code=dumped, signal=SEGV)
CPU: 8min 42.270s

Sep 19 16:40:52 pve systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Sep 19 16:40:52 pve pvestatd[1279]: starting server
Sep 19 16:40:52 pve systemd[1]: Started pvestatd.service - PVE Status Daemon.
Sep 19 20:50:23 pve pvestatd[1279]: auth key pair too old, rotating..
Sep 20 14:52:56 pve pvestatd[1279]: lxc console cleanup error: Unknown open() mode '1>
Sep 20 20:50:27 pve pvestatd[1279]: auth key pair too old, rotating..
Sep 21 13:45:39 pve systemd[1]: pvestatd.service: Main process exited, code=dumped, s>
Sep 21 13:45:39 pve systemd[1]: pvestatd.service: Failed with result 'core-dump'.
Sep 21 13:45:39 pve systemd[1]: pvestatd.service: Consumed 8min 42.270s CPU time.

Ansekh

alexdelprete · Sep 22, 2025

Ansekh said:
Before 8.4 I had no issues for several months, but now I can’t go 24 hours without having to restart the service, I’ve tried everything.

While waiting to understand what the issue is and the patch to be merged, I implemented this workaround

systemctl edit pvestatd.service
Add the following in the editing section:
[Service]
Restart=on-failure
save and exit
systemctl daemon-reload

The service should be automatically restarted on-failure.

Search

Search

pvestatd crashes every few days

xenon96

Well-Known Member

xenon96

Well-Known Member

xenon96

Well-Known Member

xenon96

Well-Known Member

promo2100

Member

fiona

Proxmox Staff Member

xenon96

Well-Known Member

xenon96

Well-Known Member

xenon96

Well-Known Member

fiona

Proxmox Staff Member

xenon96

Well-Known Member

xenon96

Well-Known Member

xenon96

Well-Known Member

fiona

Proxmox Staff Member

alexdelprete

Member

fiona

Proxmox Staff Member

Ansekh

New Member

alexdelprete

Member

We value your privacy