pvestatd crashes every few days

xenon96

Member
Nov 17, 2020
26
5
23
Bavaria, Germany
collinwebdesigns.de
Some details

Every few days (at least once a week) the pvestatd service crashes.
I can still log in via the Proxmox GUI (and via ssh), but the containers are all displayed with a "?"
As soon as I restart the pvestadt service (which also works from the GUI), I can see the status of all CT/VMs again.
Most of the CT/VMs are working and running fine, but not all of them.
There is no scheme, sometimes VM X is still running but the services on it are stopped.

Technical Details

- Hardware: Minisforum MS-01
- CPU: 13th Gen Intel(R) Core(TM) i9-13900H (from cat /proc/cpuinfo)
- RAM: 96GB Crucial DDR5 RAM 96GB Kit (2x48GB) 5600MHz SODIMM

proxmox-ve: 8.4.0 (running kernel: 6.14.0-2-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.14.0-2-pve-signed: 6.14.0-2
proxmox-kernel-6.14: 6.14.0-2
proxmox-kernel-6.14.0-1-pve-signed: 6.14.0-1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
frr-pythontools: 10.2.2-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.3
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2

22 April

journalctl -u pvestatd

Code:
Apr 22 01:00:02 c513 pvestatd[3825]: unable to get PID for CT 253 (not running?)
Apr 22 03:44:35 c513 pvestatd[3825]: status update time (5.898 seconds)
Apr 22 04:45:10 c513 pvestatd[3825]: unable to get PID for CT 108 (not running?)
Apr 22 04:45:40 c513 pvestatd[3825]: modified cpu set for lxc/108: 6,9
Apr 22 04:46:00 c513 pvestatd[3825]: modified cpu set for lxc/108: 9,13
Apr 22 04:46:00 c513 pvestatd[3825]: modified cpu set for lxc/109: 6,14
Apr 22 05:15:21 c513 pvestatd[3825]: modified cpu set for lxc/100: 4-5
Apr 22 05:15:30 c513 pvestatd[3825]: modified cpu set for lxc/109: 8,14
Apr 22 05:15:30 c513 pvestatd[3825]: modified cpu set for lxc/259: 6,15
Apr 22 05:30:21 c513 pvestatd[3825]: modified cpu set for lxc/199: 7,18
Apr 22 05:30:21 c513 pvestatd[3825]: modified cpu set for lxc/253: 9,19
Apr 22 05:30:42 c513 pvestatd[3825]: modified cpu set for lxc/259: 1,15
Apr 22 05:30:42 c513 pvestatd[3825]: modified cpu set for lxc/301: 2,6
Apr 22 05:30:50 c513 pvestatd[3825]: unable to get PID for CT 303 (not running?)
Apr 22 05:32:10 c513 pvestatd[3825]: modified cpu set for lxc/108: 0,13
Apr 22 05:32:10 c513 pvestatd[3825]: modified cpu set for lxc/253: 1,9
Apr 22 05:32:31 c513 pvestatd[3825]: modified cpu set for lxc/108: 10,13
Apr 22 05:32:31 c513 pvestatd[3825]: modified cpu set for lxc/301: 2,11
Apr 22 05:33:00 c513 pvestatd[3825]: modified cpu set for lxc/198: 16-17
Apr 22 05:33:00 c513 pvestatd[3825]: modified cpu set for lxc/300: 17,19
Apr 22 05:33:50 c513 pvestatd[3825]: modified cpu set for lxc/300: 8,17
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 22 05:37:40 c513 systemd[1]: pvestatd.service: Consumed 57min 30.326s CPU time.
Apr 22 06:50:10 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 22 06:50:10 c513 pvestatd[1218420]: starting server
Apr 22 06:50:10 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.

cat /var/log/syslog

Code:
2025-04-22T05:37:40.222420+02:00 c513 kernel: [32210.491667] pvestatd[3825]: segfault at 32 ip 00005e499fa82232 sp 00007fff63bc9b00 error 4 in perl[ff232,5e499f9cc000+195000] likely on CPU 6 (core 12, socket 0)
2025-04-22T05:37:40.226321+02:00 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV

26 April

journalctl -u pvestatd

Code:
Apr 26 05:32:27 c513 pvestatd[2555394]: modified cpu set for lxc/109: 8,18
Apr 26 05:33:07 c513 pvestatd[2555394]: modified cpu set for lxc/100: 15-16
Apr 26 05:54:18 c513 pvestatd[2555394]: auth key pair too old, rotating..
Apr 26 06:24:47 c513 pvestatd[2555394]: Argument "2555394:1658635" isn't numeric in int at /usr/share/perl5/PVE/QMPClient.pm line 273.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Consumed 9h 24min 56.262s CPU time.
Apr 28 07:06:47 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 28 07:06:48 c513 pvestatd[1083338]: starting server
Apr 28 07:06:48 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.

cat /var/log/syslog

Code:
Apr 26 06:34:27 c513 kernel: pvestatd[2555394]: segfault at ffffffffffffffff ip 0000653ef51344dc sp 00007ffeae4bab10 error 7 in perl[1344dc,653ef5049000+195000] likely on CPU 6 (core 12, socket 0)
Apr 26 06:34:27 c513 kernel: Code: 8b 43 0c e9 6a ff ff ff 66 0f 1f 44 00 00 3c 02 0f 86 a0 00 00 00 0d 00 00 00 10 48 8b 55 10 89 45 0c 48 8b 45 00 48 8b 40 18 <c6> 44 02 ff 00 48 8b 45 00 48 8b 75 10 48 8b 40 18 e9 73 ff ff ff
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 26 06:34:27 c513 systemd[1]: pvestatd.service: Consumed 9h 24min 56.262s CPU time.

My Idea

At first I thought it was somehow due to the RAM, as it was minimally overprovisioned and I use ZFS, but then I set the ZFS-MAX-ARC size to 6GB and changed all RAM assignments, so that I currently come out with about 76GB assigned RAM, only for the VMs. Also I have already kmstuned deactivated.

Is it perhaps due to the CPU set scheduling in combination with the litte-big CPU architecture?
 
pvestatd is failed again

Apr 29 04:02:46 c513 pvestatd[3875]: status update time (5.119 seconds)
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/100: 8,10
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,14-15,18
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,9,14-15
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/108: 18-19
Apr 29 05:15:11 c513 pvestatd[3875]: modified cpu set for lxc/109: 8,10
Apr 29 05:16:19 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - got timeout
Apr 29 05:16:20 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,18
Apr 29 05:16:21 c513 pvestatd[3875]: status update time (9.462 seconds)
Apr 29 05:16:29 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - unable to connect to VM>
Apr 29 05:16:30 c513 pvestatd[3875]: status update time (9.634 seconds)
Apr 29 05:16:40 c513 pvestatd[3875]: status update time (8.436 seconds)
Apr 29 05:30:11 c513 pvestatd[3875]: modified cpu set for lxc/100: 18-19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/254: 1-2
Apr 29 05:30:22 c513 pvestatd[3875]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/LXC.pm line 305.
Apr 29 05:30:22 c513 pvestatd[3875]: lxc console cleanup error: failed to read from command socket: Connection reset by peer
Apr 29 05:30:32 c513 pvestatd[3875]: modified cpu set for lxc/100: 16,19
Apr 29 05:30:41 c513 pvestatd[3875]: modified cpu set for lxc/300: 0,18
Apr 29 05:30:51 c513 pvestatd[3875]: modified cpu set for lxc/108: 3,19
Apr 29 05:32:11 c513 pvestatd[3875]: modified cpu set for lxc/189: 5-6,10-11
Apr 29 05:32:22 c513 pvestatd[3875]: modified cpu set for lxc/300: 2,18
Apr 29 05:32:23 c513 pvestatd[3875]: unable to get PID for CT 309 (not running?)
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/102: 0,11
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/108: 1,19
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Consumed 2h 23min 7.263s CPU time.
 
pvestatd is failed again.

Please tell me if I should debug specific logs / services. Unless I'll keep posting error messages.

Apr 29 19:19:36 c513 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
Apr 29 19:19:36 c513 kernel: #PF: supervisor read access in kernel mode
Apr 29 19:19:36 c513 kernel: #PF: error_code(0x0000) - not-present page
Apr 29 19:19:36 c513 kernel: PGD 0 P4D 0
Apr 29 19:19:36 c513 kernel: Oops: Oops: 0000 [#5] PREEMPT SMP NOPTI
Apr 29 19:19:36 c513 kernel: CPU: 6 UID: 0 PID: 89922 Comm: pvestatd Tainted: P UD W OE 6.14.0-2-pve #1
Apr 29 19:19:36 c513 kernel: Tainted: [P]=PROPRIETARY_MODULE, =USER, [D]=DIE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Apr 29 19:19:36 c513 kernel: Hardware name: Micro Computer (HK) Tech Limited Venus Series/AHWSA, BIOS AHWSA.1.22 03/12/2024
Apr 29 19:19:36 c513 kernel: RIP: 0010:free_pages_and_swap_cache+0x2a/0x1c0
Apr 29 19:19:36 c513 kernel: Code: 0f 1f 44 00 00 55 b9 20 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 48 8d bd d0 fe ff ff 53 48 81 ec 88 01 00 00 65 48 <8b> 04 25 28 00 00 00 48 89 45 d0 31 c0 f3 48 ab 48 8d bd 58 fe ff
Apr 29 19:19:36 c513 kernel: RSP: 0018:ffffbe8b81377810 EFLAGS: 00010246
Apr 29 19:19:36 c513 kernel: RAX: 0000000000000000 RBX: 00000000000001fd RCX: ffffef8e49a3be08
Apr 29 19:19:36 c513 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: ffffbe8b813779c0 R08: 0000000000000000 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a953bfd8010
Apr 29 19:19:36 c513 kernel: R13: ffffef8e72e6d200 R14: ffff9a953bfd86b8 R15: 00000000000000d5
Apr 29 19:19:36 c513 kernel: FS: 0000000000000000(0000) GS:ffff9a97af300000(0000) knlGS:0000000000000000
Apr 29 19:19:36 c513 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 29 19:19:36 c513 kernel: CR2: 0000000000000028 CR3: 000000147a282005 CR4: 0000000000f72ef0
Apr 29 19:19:36 c513 kernel: PKRU: 55555554
Apr 29 19:19:36 c513 kernel: Call Trace:
Apr 29 19:19:36 c513 kernel: <TASK>
Apr 29 19:19:36 c513 kernel: ? show_regs+0x6c/0x80
Apr 29 19:19:36 c513 kernel: ? __die+0x24/0x80
Apr 29 19:19:36 c513 kernel: ? page_fault_oops+0x175/0x5e0
Apr 29 19:19:36 c513 kernel: ? do_user_addr_fault+0x4a5/0x830
Apr 29 19:19:36 c513 kernel: ? __count_memcg_events+0xc0/0x160
Apr 29 19:19:36 c513 kernel: ? exc_page_fault+0x85/0x1e0
Apr 29 19:19:36 c513 kernel: ? asm_exc_page_fault+0x27/0x30
Apr 29 19:19:36 c513 kernel: ? free_pages_and_swap_cache+0x2a/0x1c0
Apr 29 19:19:36 c513 kernel: __tlb_batch_free_encoded_pages+0x45/0xb0
Apr 29 19:19:36 c513 kernel: tlb_flush_mmu+0x52/0x150
Apr 29 19:19:36 c513 kernel: unmap_page_range+0xc1d/0x1ab0
Apr 29 19:19:36 c513 kernel: unmap_single_vma+0x89/0xf0
Apr 29 19:19:36 c513 kernel: unmap_vmas+0xb5/0x190
Apr 29 19:19:36 c513 kernel: exit_mmap+0xfa/0x3f0
Apr 29 19:19:36 c513 kernel: mmput+0x69/0x130
Apr 29 19:19:36 c513 kernel: do_exit+0x2c9/0xab0
Apr 29 19:19:36 c513 kernel: do_group_exit+0x34/0x90
Apr 29 19:19:36 c513 kernel: __x64_sys_exit_group+0x18/0x20
Apr 29 19:19:36 c513 kernel: x64_sys_call+0xf7a/0x2540
Apr 29 19:19:36 c513 kernel: do_syscall_64+0x7e/0x170
Apr 29 19:19:36 c513 kernel: ? __count_memcg_events+0xc0/0x160
Apr 29 19:19:36 c513 kernel: ? count_memcg_events.constprop.0+0x2a/0x50
Apr 29 19:19:36 c513 kernel: ? handle_mm_fault+0xae/0x360
Apr 29 19:19:36 c513 kernel: ? do_user_addr_fault+0x5ec/0x830
Apr 29 19:19:36 c513 kernel: ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
Apr 29 19:19:36 c513 kernel: ? irqentry_exit_to_user_mode+0x2d/0x1d0
Apr 29 19:19:36 c513 kernel: ? irqentry_exit+0x43/0x50
Apr 29 19:19:36 c513 kernel: ? exc_page_fault+0x96/0x1e0
Apr 29 19:19:36 c513 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 29 19:19:36 c513 kernel: RIP: 0033:0x7b77659e9409
Apr 29 19:19:36 c513 kernel: Code: Unable to access opcode bytes at 0x7b77659e93df.
Apr 29 19:19:36 c513 kernel: RSP: 002b:00007ffc6064eb78 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Apr 29 19:19:36 c513 kernel: RAX: ffffffffffffffda RBX: 0000603f92b5f2a0 RCX: 00007b77659e9409
Apr 29 19:19:36 c513 kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: 0000000000000001 R08: ffffffffffffff78 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 00007b7765927200 R11: 0000000000000206 R12: 000000000000001d
Apr 29 19:19:36 c513 kernel: R13: 0000000000000000 R14: 0000603f98ce8ba0 R15: 0000603f930ba208
Apr 29 19:19:36 c513 kernel: </TASK>
Apr 29 19:19:36 c513 kernel: Modules linked in: tcp_diag inet_diag xt_mark vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd nf_conntrack_netlink xt_nat xt_tcpudp nfsv3 nfs_acl nfs lockd grace xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay ccm cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter ipmi_devintf ipmi_msghandler scsi_transport_iscsi nf_tables nvme_fabrics nvme_keyring softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink
Apr 29 19:19:36 c513 kernel: snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec_hdmi snd_soc_core snd_compress mt7921e ac97_bus snd_pcm_dmaengine mt7921_common mt792x_lib snd_hda_intel mt76_connac_lib kvm_intel snd_intel_dspcfg mt76 i915(OE) snd_intel_sdw_acpi nouveau kvm mxm_wmi polyval_clmulni snd_hda_codec drm_gpuvm polyval_generic mac80211 btusb ghash_clmulni_intel snd_hda_core gpu_sched drm_buddy sha256_ssse3 btrtl sha1_ssse3 drm_ttm_helper snd_hwdep btintel ttm aesni_intel btbcm snd_pcm btmtk drm_suballoc_helper drm_exec crypto_simd cmdlinepart drm_display_helper snd_timer cfg80211 bluetooth cryptd spi_nor cdc_acm cec rapl mei_me spd5118 snd wmi_bmof intel_cstate libarc4 rc_core mtd pcspkr soundcore mei i2c_algo_bit igen6_edac intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid vhost_net vhost
Apr 29 19:19:36 c513 kernel: vhost_iotlb tap nct6775 nct6775_core hwmon_vid coretemp efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq uas usb_storage nvme xhci_pci spi_intel_pci thunderbolt i2c_i801 i40e igc nvme_core spi_intel i2c_smbus xhci_hcd i2c_mux nvme_auth libie video pinctrl_tigerlake wmi
Apr 29 19:19:36 c513 kernel: CR2: 0000000000000028
Apr 29 19:19:36 c513 kernel: ---[ end trace 0000000000000000 ]---
Apr 29 19:19:36 c513 kernel: RIP: 0010:syscall_exit_to_user_mode+0x39/0x1d0
Apr 29 19:19:36 c513 kernel: Code: 06 ff fa 0f 1f 44 00 00 0f 1f 44 00 00 65 4c 8b 2d 9c 28 43 71 49 8b 5d 00 f7 c3 1e 30 02 00 75 5a 48 89 df e8 58 b1 06 ff 0f <1f> 44 00 00 31 c0 b9 01 00 00 00 89 c2 66 90 5b 41 5c 41 5d 5d 31
Apr 29 19:19:36 c513 kernel: RSP: 0018:ffffbe8b8f8a3bd0 EFLAGS: 00010046
Apr 29 19:19:36 c513 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Apr 29 19:19:36 c513 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: ffffbe8b8f8a3be8 R08: 0000000000000000 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffbe8b8f8a3f48
Apr 29 19:19:36 c513 kernel: R13: ffff9a8032c0aa40 R14: 0000000000000000 R15: 0000000000000000
Apr 29 19:19:36 c513 kernel: FS: 0000000000000000(0000) GS:ffff9a97af300000(0000) knlGS:0000000000000000
Apr 29 19:19:36 c513 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 29 19:19:36 c513 kernel: CR2: 0000000000000028 CR3: 000000147a282005 CR4: 0000000000f72ef0
Apr 29 19:19:36 c513 kernel: PKRU: 55555554
Apr 29 19:19:36 c513 kernel: note: pvestatd[89922] exited with irqs disabled
Apr 29 19:19:36 c513 kernel: Fixing recursive fault but reboot is needed!
Apr 29 19:19:36 c513 kernel: BUG: scheduling while atomic: pvestatd/89922/0x00000000
Apr 29 19:19:36 c513 kernel: Modules linked in: tcp_diag inet_diag xt_mark vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd nf_conntrack_netlink xt_nat xt_tcpudp nfsv3 nfs_acl nfs lockd grace xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat xfrm_user xfrm_algo overlay ccm cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter ipmi_devintf ipmi_msghandler scsi_transport_iscsi nf_tables nvme_fabrics nvme_keyring softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink
Apr 29 19:19:36 c513 kernel: snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec_hdmi snd_soc_core snd_compress mt7921e ac97_bus snd_pcm_dmaengine mt7921_common mt792x_lib snd_hda_intel mt76_connac_lib kvm_intel snd_intel_dspcfg mt76 i915(OE) snd_intel_sdw_acpi nouveau kvm mxm_wmi polyval_clmulni snd_hda_codec drm_gpuvm polyval_generic mac80211 btusb ghash_clmulni_intel snd_hda_core gpu_sched drm_buddy sha256_ssse3 btrtl sha1_ssse3 drm_ttm_helper snd_hwdep btintel ttm aesni_intel btbcm snd_pcm btmtk drm_suballoc_helper drm_exec crypto_simd cmdlinepart drm_display_helper snd_timer cfg80211 bluetooth cryptd spi_nor cdc_acm cec rapl mei_me spd5118 snd wmi_bmof intel_cstate libarc4 rc_core mtd pcspkr soundcore mei i2c_algo_bit igen6_edac intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid vhost_net vhost
Apr 29 19:19:36 c513 kernel: vhost_iotlb tap nct6775 nct6775_core hwmon_vid coretemp efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq uas usb_storage nvme xhci_pci spi_intel_pci thunderbolt i2c_i801 i40e igc nvme_core spi_intel i2c_smbus xhci_hcd i2c_mux nvme_auth libie video pinctrl_tigerlake wmi
Apr 29 19:19:36 c513 kernel: CPU: 6 UID: 0 PID: 89922 Comm: pvestatd Tainted: P UD W OE 6.14.0-2-pve #1
Apr 29 19:19:36 c513 kernel: Tainted: [P]=PROPRIETARY_MODULE, =USER, [D]=DIE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Apr 29 19:19:36 c513 kernel: Hardware name: Micro Computer (HK) Tech Limited Venus Series/AHWSA, BIOS AHWSA.1.22 03/12/2024
Apr 29 19:19:36 c513 kernel: Call Trace:
Apr 29 19:19:36 c513 kernel: <TASK>
Apr 29 19:19:36 c513 kernel: dump_stack_lvl+0x76/0xa0
Apr 29 19:19:36 c513 kernel: dump_stack+0x10/0x20
Apr 29 19:19:36 c513 kernel: __schedule_bug+0x64/0x80
Apr 29 19:19:36 c513 kernel: __schedule+0x1058/0x13f0
Apr 29 19:19:36 c513 kernel: ? vprintk+0x18/0x50
Apr 29 19:19:36 c513 kernel: ? _printk+0x60/0x90
Apr 29 19:19:36 c513 kernel: do_task_dead+0x43/0x50
Apr 29 19:19:36 c513 kernel: make_task_dead+0x142/0x160
Apr 29 19:19:36 c513 kernel: rewind_stack_and_make_dead+0x16/0x20
Apr 29 19:19:36 c513 kernel: RIP: 0033:0x7b77659e9409
Apr 29 19:19:36 c513 kernel: Code: Unable to access opcode bytes at 0x7b77659e93df.
Apr 29 19:19:36 c513 kernel: RSP: 002b:00007ffc6064eb78 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Apr 29 19:19:36 c513 kernel: RAX: ffffffffffffffda RBX: 0000603f92b5f2a0 RCX: 00007b77659e9409
Apr 29 19:19:36 c513 kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Apr 29 19:19:36 c513 kernel: RBP: 0000000000000001 R08: ffffffffffffff78 R09: 0000000000000000
Apr 29 19:19:36 c513 kernel: R10: 00007b7765927200 R11: 0000000000000206 R12: 000000000000001d
Apr 29 19:19:36 c513 kernel: R13: 0000000000000000 R14: 0000603f98ce8ba0 R15: 0000603f930ba208
Apr 29 19:19:36 c513 kernel: </TASK>
Apr 29 19:19:37 c513 pvestatd[2994440]: got timeout


Apr 29 19:32:55 c513 kernel: pvestatd[2994440]: segfault at 12 ip 0000603f7e9cb232 sp 00007ffc6064eb40 error 4 in perl[ff232,603f7e915000+195000] likely on CPU 6 (core 12, socket 0)
Apr 29 19:32:55 c513 kernel: Code: 20 49 03 41 28 81 60 0c ff ff 1f ff 0f 1f 40 00 45 84 f6 74 73 4c 89 e3 4d 39 fc 0f 84 bf 00 00 00 48 85 db 0f 84 91 00 00 00 <48> 8b 6b 08 4c 8b 23 f6 43 13 04 0f 84 5d ff ff ff 48 8d 05 56 ea
Apr 29 19:32:55 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:35:55 c513 systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: State 'final-sigterm' timed out. Killing.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Processes still around after final SIGKILL. Entering failed mode.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Unit process 89922 (pvestatd) remains running after unit stopped.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Consumed 1h 18min 42.480s CPU time.

 
pvestatd is failed again.

Please tell me if I should debug specific logs / services. Unless I'll keep posting error messages.

Before this logs, there are some messages which are interessing (may warnigns or errors?). This messages are before the above one (see timestmaps)

Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/253: 3,16
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/254: 2,4
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/259: 17-18
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/300: 3,19
Apr 29 01:22:31 c513 pvestatd[3875]: modified cpu set for lxc/304: 4-5
Apr 29 04:02:46 c513 pvestatd[3875]: status update time (5.119 seconds)
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/100: 8,10
Apr 29 04:45:12 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,14-15,18
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/103: 7,9,14-15
Apr 29 04:45:41 c513 pvestatd[3875]: modified cpu set for lxc/108: 18-19
Apr 29 05:15:11 c513 pvestatd[3875]: modified cpu set for lxc/109: 8,10
Apr 29 05:16:19 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - got timeout
Apr 29 05:16:20 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,18
Apr 29 05:16:21 c513 pvestatd[3875]: status update time (9.462 seconds)
Apr 29 05:16:29 c513 pvestatd[3875]: VM 901 qmp command failed - VM 901 qmp command 'query-proxmox-support' failed - unable to conne>
Apr 29 05:16:30 c513 pvestatd[3875]: status update time (9.634 seconds)
Apr 29 05:16:40 c513 pvestatd[3875]: status update time (8.436 seconds)
Apr 29 05:30:11 c513 pvestatd[3875]: modified cpu set for lxc/100: 18-19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/100: 0,19
Apr 29 05:30:21 c513 pvestatd[3875]: modified cpu set for lxc/254: 1-2
Apr 29 05:30:22 c513 pvestatd[3875]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/LXC.pm line 305.
Apr 29 05:30:22 c513 pvestatd[3875]: lxc console cleanup error: failed to read from command socket: Connection reset by peer
Apr 29 05:30:32 c513 pvestatd[3875]: modified cpu set for lxc/100: 16,19
Apr 29 05:30:41 c513 pvestatd[3875]: modified cpu set for lxc/300: 0,18
Apr 29 05:30:51 c513 pvestatd[3875]: modified cpu set for lxc/108: 3,19
Apr 29 05:32:11 c513 pvestatd[3875]: modified cpu set for lxc/189: 5-6,10-11
Apr 29 05:32:22 c513 pvestatd[3875]: modified cpu set for lxc/300: 2,18
Apr 29 05:32:23 c513 pvestatd[3875]: unable to get PID for CT 309 (not running?)
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/102: 0,11
Apr 29 05:33:11 c513 pvestatd[3875]: modified cpu set for lxc/108: 1,19
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 05:59:01 c513 systemd[1]: pvestatd.service: Consumed 2h 23min 7.263s CPU time.
Apr 29 07:59:22 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 29 07:59:23 c513 pvestatd[2994440]: starting server
Apr 29 07:59:23 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value $fh in <HANDLE> at /usr/share/perl5/PVE/Tools.pm line 357.
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 113 qmp command failed - VM 113 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 283 qmp command failed - VM 283 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 601 qmp command failed - VM 601 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 112 qmp command failed - VM 112 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 281 qmp command failed - VM 281 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 114 qmp command failed - VM 114 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: VM 285 qmp command failed - VM 285 not running
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/Service/pvestatd.pm li>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/Service/pvestatd.pm>
Apr 29 12:52:53 c513 pvestatd[2994440]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Service/pv>
Apr 29 12:52:56 c513 pvestatd[2994440]: pbs: error fetching datastores - 401 Unauthorized
Apr 29 12:55:53 c513 pvestatd[2994440]: modified cpu set for lxc/109: 1,19
Apr 29 12:56:04 c513 pvestatd[2994440]: modified cpu set for lxc/198: 1,13
Apr 29 12:56:04 c513 pvestatd[2994440]: modified cpu set for lxc/259: 18-19
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value $fh in <HANDLE> at /usr/share/perl5/PVE/ProcFSTools.pm line 171, >
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 30>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:35 c513 pvestatd[2994440]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 31>
Apr 29 17:51:38 c513 pvestatd[2994440]: pbs: error fetching datastores - 401 Unauthorized
Apr 29 19:19:37 c513 pvestatd[2994440]: got timeout
Apr 29 19:32:55 c513 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Apr 29 19:34:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:35:55 c513 systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: State 'final-sigterm' timed out. Killing.
Apr 29 19:37:25 c513 systemd[1]: pvestatd.service: Killing process 89922 (pvestatd) with signal SIGKILL.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Processes still around after final SIGKILL. Entering failed mode.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Unit process 89922 (pvestatd) remains running after unit stopped.
Apr 29 19:38:55 c513 systemd[1]: pvestatd.service: Consumed 1h 18min 42.480s CPU time.
-- Boot 36aea0ed5d9740a498b064e581186788 --
Apr 29 21:39:50 c513 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 29 21:39:51 c513 pvestatd[3885]: starting server
Apr 29 21:39:51 c513 systemd[1]: Started pvestatd.service - PVE Status Daemon.