Hi all,
I'm running Proxmox VE 7.3-3 on a 3-node cluster. One of my nodes rebooted today after replication to it from the 2 other nodes failed. I've got a bunch of these messages in /var/log/syslog:
The crashes were here:
And these are recent errors.
I've tried restarting pveproxy, rebooting the node - All of my VMs that are in HA say they're still in HA and replication is working, however when I try to go to the node on port 8006, the web management console on only that node times out. If I try to access it from another node, only some things (like remote shell for the main host) come up.
System is an HP Elite 800 mini with Intel i7-12700t Alder Lake, 64GB DDR5 RAM, 2 NVMe 1TB SSDs in RAIDz1, a 1.92TB enterprise SATA SSD, and 64GB. There is an internal cluster interface on 192.168.x.x and externals with public IPs though port 8006 is blocked from outside the network. The system had a 29 or so day uptime until this happened. Any help is appreciated.
Thanks,
Bear
I'm running Proxmox VE 7.3-3 on a 3-node cluster. One of my nodes rebooted today after replication to it from the 2 other nodes failed. I've got a bunch of these messages in /var/log/syslog:
The crashes were here:
Code:
Dec 27 10:17:27 Jaguar kernel: [ 0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 27 10:17:27 Jaguar kernel: [ 10.690899] pstore: Using crash dump compression: deflate
Dec 27 10:54:46 Jaguar kernel: [ 0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 27 10:54:46 Jaguar kernel: [ 3.752694] pstore: Using crash dump compression: deflate
Dec 27 11:03:39 Jaguar kernel: [ 0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 27 11:03:39 Jaguar kernel: [ 3.755702] pstore: Using crash dump compression: deflate
And these are recent errors.
Code:
Dec 27 11:12:58 Jaguar kernel: [ 564.795861] x86/split lock detection: #AC: CPU 3/KVM/3327 took a split_lock trap at address: 0xfffff8064da24b6f
Dec 27 11:15:08 Jaguar kernel: [ 693.905122] x86/split lock detection: #AC: CPU 4/KVM/3328 took a split_lock trap at address: 0xfffff8064da24b6f
Dec 27 17:41:23 Jaguar kernel: [23869.265255] perf: interrupt took too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
Dec 27 18:22:06 Jaguar kernel: [26311.967076] x86/split lock detection: #AC: CPU 0/KVM/3324 took a split_lock trap at address: 0xfffff8064da79c9f
Dec 27 21:16:27 Jaguar kernel: [36773.504957] perf: interrupt took too long (3144 > 3138), lowering kernel.perf_event_max_sample_rate to 63500
Dec 27 22:03:23 Jaguar kernel: [39589.576370] x86/split lock detection: #AC: CPU 0/KVM/3324 took a split_lock trap at address: 0xfffff8064da79c9f
Dec 27 22:05:39 Jaguar kernel: [39725.185644] x86/split lock detection: #AC: CPU 1/KVM/3325 took a split_lock trap at address: 0xfffff8064da21396
I've tried restarting pveproxy, rebooting the node - All of my VMs that are in HA say they're still in HA and replication is working, however when I try to go to the node on port 8006, the web management console on only that node times out. If I try to access it from another node, only some things (like remote shell for the main host) come up.
System is an HP Elite 800 mini with Intel i7-12700t Alder Lake, 64GB DDR5 RAM, 2 NVMe 1TB SSDs in RAIDz1, a 1.92TB enterprise SATA SSD, and 64GB. There is an internal cluster interface on 192.168.x.x and externals with public IPs though port 8006 is blocked from outside the network. The system had a 29 or so day uptime until this happened. Any help is appreciated.
Thanks,
Bear
Last edited: