Problem with web accessibility on one node.

Sep 24, 2022
11
1
3
Hi all,

I'm running Proxmox VE 7.3-3 on a 3-node cluster. One of my nodes rebooted today after replication to it from the 2 other nodes failed. I've got a bunch of these messages in /var/log/syslog:

The crashes were here:
Code:
Dec 27 10:17:27 Jaguar kernel: [    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 27 10:17:27 Jaguar kernel: [   10.690899] pstore: Using crash dump compression: deflate
Dec 27 10:54:46 Jaguar kernel: [    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 27 10:54:46 Jaguar kernel: [    3.752694] pstore: Using crash dump compression: deflate
Dec 27 11:03:39 Jaguar kernel: [    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 27 11:03:39 Jaguar kernel: [    3.755702] pstore: Using crash dump compression: deflate

And these are recent errors.

Code:
Dec 27 11:12:58 Jaguar kernel: [  564.795861] x86/split lock detection: #AC: CPU 3/KVM/3327 took a split_lock trap at address: 0xfffff8064da24b6f
Dec 27 11:15:08 Jaguar kernel: [  693.905122] x86/split lock detection: #AC: CPU 4/KVM/3328 took a split_lock trap at address: 0xfffff8064da24b6f
Dec 27 17:41:23 Jaguar kernel: [23869.265255] perf: interrupt took too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
Dec 27 18:22:06 Jaguar kernel: [26311.967076] x86/split lock detection: #AC: CPU 0/KVM/3324 took a split_lock trap at address: 0xfffff8064da79c9f
Dec 27 21:16:27 Jaguar kernel: [36773.504957] perf: interrupt took too long (3144 > 3138), lowering kernel.perf_event_max_sample_rate to 63500
Dec 27 22:03:23 Jaguar kernel: [39589.576370] x86/split lock detection: #AC: CPU 0/KVM/3324 took a split_lock trap at address: 0xfffff8064da79c9f
Dec 27 22:05:39 Jaguar kernel: [39725.185644] x86/split lock detection: #AC: CPU 1/KVM/3325 took a split_lock trap at address: 0xfffff8064da21396

I've tried restarting pveproxy, rebooting the node - All of my VMs that are in HA say they're still in HA and replication is working, however when I try to go to the node on port 8006, the web management console on only that node times out. If I try to access it from another node, only some things (like remote shell for the main host) come up.

System is an HP Elite 800 mini with Intel i7-12700t Alder Lake, 64GB DDR5 RAM, 2 NVMe 1TB SSDs in RAIDz1, a 1.92TB enterprise SATA SSD, and 64GB. There is an internal cluster interface on 192.168.x.x and externals with public IPs though port 8006 is blocked from outside the network. The system had a 29 or so day uptime until this happened. Any help is appreciated.

Thanks,
Bear
 
Last edited:
Hi,

can you still reach the node via ssh? Or is it completely unreachable?

I've tried restarting pveproxy
Anything interesting when you check the status of pveproxy? Maybe also try reloading pvedaemon, just in case. Do your requests show up in /var/log/pveproxy/access.log?
 
Hi,

can you still reach the node via ssh? Or is it completely unreachable?


Anything interesting when you check the status of pveproxy? Maybe also try reloading pvedaemon, just in case. Do your requests show up in /var/log/pveproxy/access.log?
I can ssh to it.

Status for pveproxy gives me:
Dec 28 08:43:10 Jaguar pveproxy[40475]: starting 2 worker(s) Dec 28 08:43:10 Jaguar pveproxy[40475]: worker 2758053 started Dec 28 08:43:10 Jaguar pveproxy[40475]: worker 2758054 started Dec 28 08:43:10 Jaguar pveproxy[2758054]: unable to open log file '/var/log/pveproxy/access.log' - Permission denied Dec 28 08:43:10 Jaguar pveproxy[2758053]: unable to open log file '/var/log/pveproxy/access.log' - Permission denied Dec 28 08:43:10 Jaguar pveproxy[2757864]: worker exit Dec 28 08:43:10 Jaguar pveproxy[40475]: worker 2757864 finished Dec 28 08:43:10 Jaguar pveproxy[40475]: starting 1 worker(s) Dec 28 08:43:10 Jaguar pveproxy[40475]: worker 2758055 started Dec 28 08:43:10 Jaguar pveproxy[2758055]: unable to open log file '/var/log/pveproxy/access.log' - Permission denied

Catting /var/log/pveproxy/access.log gives me nothing.
root@Jaguar:/var/log/pveproxy# ls -la total 53 drwx------ 2 www-data www-data 10 Dec 27 10:17 . drwxr-xr-x 15 root root 87 Dec 28 00:00 .. -rw------- 1 root root 0 Dec 27 00:00 access.log -rw-r----- 1 www-data www-data 115 Dec 27 10:17 access.log.1 -rw-r----- 1 www-data www-data 124 Dec 24 21:05 access.log.2.gz -rw-r----- 1 www-data www-data 199 Dec 22 13:19 access.log.3.gz -rw-r----- 1 www-data www-data 143 Dec 20 10:26 access.log.4.gz -rw-r----- 1 www-data www-data 953 Dec 17 22:40 access.log.5.gz -rw-r----- 1 www-data www-data 206 Dec 11 16:59 access.log.6.gz -rw-r----- 1 www-data www-data 7204 Dec 8 12:47 access.log.7.gz
 
Last edited:
The permissions for the log file seem to be messed up. Try running
Code:
chown www-data:www-data access.log
chmod 640 access.log
and restart pveproxy, just for good measure.
 
Last edited:
Thanks, Leo - I've got web access back.

Now I'd like to figure out what exactly happened to mess up the perms, as well as what caused the system to do this prior to rebooting. Does it have something to do with the Spit Lock detection errors?
 

Attachments

  • Screenshot 2022-12-28 090204.png
    Screenshot 2022-12-28 090204.png
    271 KB · Views: 2
Now I'd like to figure out what exactly happened to mess up the perms
Hmm, I've seen several threads now with the same problem, where the permissions for the access log were set to root-only. I would suggest opening a report on the bugtracker [1], so that others will also be able to discuss this issue.

Does it have something to do with the Spit Lock detection errors?
Is it the first time that this happened? You can try turning it off, this [2] thread should be a good reference.

Could you maybe post a few more details about your setup and the crash that occured?

[1] https://bugzilla.proxmox.com/
[2] https://forum.proxmox.com/threads/x86-split-lock-detection.111544/
 
Sure, I’ll file one tonight once I get back from a trip.

This is the first time this has happened. The cluster has run for over 3 months without any issues except a bad NIC in another node.

What other info would you like on the setup? Any specific command outputs you think would be beneficial?
 
Added a Bug - https://bugzilla.proxmox.com/show_bug.cgi?id=4434 for the access log perms getting changed.

Also,
Edited /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs split_lock_detect=off

Ran proxmox-boot-tool refresh

On reboot, still seeing:
Code:
Dec 28 19:38:59 Jaguar kernel: [  157.133469] x86/split lock detection: #AC: CPU 5/KVM/3579 took a split_lock trap at address: 0xfffff8074aa79c9f
Dec 28 19:40:53 Jaguar kernel: [  271.086553] x86/split lock detection: #AC: CPU 0/KVM/3574 took a split_lock trap at address: 0xfffff8074aa79c9f
Dec 28 19:41:10 Jaguar kernel: [  288.946079] x86/split lock detection: #AC: CPU 3/KVM/3577 took a split_lock trap at address: 0xfffff8074aa79c9f
 
Is it all on one single line in the actual file? At least here in the forum it looks not so; but it needs to be: [1]

You can check the active/booted kernel command line with: cat /proc/cmdline

[1] https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_edit_kernel_cmdline
D'oh! It wasn't on a single line. Fixed. Thank you!

root@Jaguar:~# cat /proc/cmdline initrd=\EFI\proxmox\5.15.74-1-pve\initrd.img-5.15.74-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs split_lock_detect=off
 
  • Like
Reactions: Neobin
Also curious though - Is the commonality between systems crashing with split locks that KVM is running Windows Server guests? I've seen that mentioned a few times, and in the referenced thread. This issue took a month to show up, so I'm not sure if setting the kernel line option will resolve it or not...at least I won't know immediately.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!