Hi. I am getting fairly regular crashes of pvestatd, maybe 2-3 times a day, and occasionaly also had pveproxy crash, and sometimes a VM go to a status of 'internal error' not long after starting up.
I ran memtest, initially for 2 passes and then another 6 and no errors. I have also re-installed PVE but still getting the issues. What can I check next? I'm running fairly new hardware - 14900k + MSI z790-A + 192Gb RAM, and running the latest MSI BIOS.
I am occasionally seeing a VM go into a status of 'Internal Error' and then I see one of the CPU cores reporting 100C (I assume its also running at %100). I then have to do a stop and restart of the VM.
Thanks.
I ran memtest, initially for 2 passes and then another 6 and no errors. I have also re-installed PVE but still getting the issues. What can I check next? I'm running fairly new hardware - 14900k + MSI z790-A + 192Gb RAM, and running the latest MSI BIOS.
Code:
[Sat Jan 6 18:49:27 2024] x86/split lock detection: #AC: CPU 0/KVM/858994 took a split_lock trap at address: 0x26a8dce1888
[Sat Jan 6 18:49:27 2024] pveproxy worker[856467]: segfault at 9 ip 000055c80750812a sp 00007ffd34c97c00 error 4 in perl[55c80741f000+195000] likely on CPU 0 (core 0, socket 0)
[Sat Jan 6 18:49:27 2024] Code: ff 00 00 00 81 e2 00 00 00 04 75 11 49 8b 96 f8 00 00 00 48 89 10 49 89 86 f8 00 00 00 49 83 ae f0 00 00 00 01 4d 85 ff 74 19 <41> 8b 47 08 85 c0 0f 84 c2 00 00 00 83 e8 01 41 89 47 08 0f 84 05
[Sat Jan 6 18:49:35 2024] perf: interrupt took too long (3973 > 3920), lowering kernel.perf_event_max_sample_rate to 50250
Code:
[Sat Jan 6 09:35:29 2024] x86/split lock detection: #AC: CPU 0/KVM/524150 took a split_lock trap at address: 0xfffff80064e42fb3
[Sat Jan 6 15:33:18 2024] pvestatd[226428]: segfault at 107 ip 0000557ac583012a sp 00007ffe91c72d30 error 4 in perl[557ac5747000+195000] likely on CPU 0 (core 0, socket 0)
[Sat Jan 6 15:33:18 2024] Code: ff 00 00 00 81 e2 00 00 00 04 75 11 49 8b 96 f8 00 00 00 48 89 10 49 89 86 f8 00 00 00 49 83 ae f0 00 00 00 01 4d 85 ff 74 19 <41> 8b 47 08 85 c0 0f 84 c2 00 00 00 83 e8 01 41 89 47 08 0f 84 05
Code:
# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 18.2.0-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.3
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.5
pve-qemu-kvm: 8.1.2-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
I am occasionally seeing a VM go into a status of 'Internal Error' and then I see one of the CPU cores reporting 100C (I assume its also running at %100). I then have to do a stop and restart of the VM.
Thanks.