Random crashes

markchen

Member
Jan 16, 2022
28
1
8
52
Proxmox VE will randomly be totally non responsive. Looking at the latest syslog it appears that it encountered error and attempted to reboot but was not able to. Below are excerpts from syslog.

Jan 16 02:12:18 UIG-VM kernel: x86/split lock detection: #AC: kvm/1095 took a split_lock trap at address: 0xfffff80462cd7aa3
Jan 16 02:17:01 UIG-VM CRON[7610]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 16 02:17:01 UIG-VM CRON[7611]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jan 16 02:17:01 UIG-VM CRON[7610]: pam_unix(cron:session): session closed for user root
Jan 16 02:17:30 UIG-VM kernel: x86/split lock detection: #AC: kvm/1096 took a split_lock trap at address: 0xfffff80462cd7aa3
Jan 16 02:18:08 UIG-VM sshd[3684]: pam_unix(sshd:session): session closed for user root
Jan 16 02:18:08 UIG-VM systemd[1]: session-5.scope: Succeeded.
Jan 16 02:18:08 UIG-VM systemd-logind[681]: Session 5 logged out. Waiting for processes to exit.
Jan 16 02:18:08 UIG-VM systemd-logind[681]: Removed session 5.
Jan 16 02:18:18 UIG-VM systemd[1]: Stopping User Manager for UID 0...
Jan 16 02:18:18 UIG-VM systemd[3687]: Stopped target Main User Target.
Jan 16 02:18:18 UIG-VM systemd[3687]: Stopped target Basic System.
Jan 16 02:18:18 UIG-VM systemd[3687]: Stopped target Paths.
Jan 16 02:18:18 UIG-VM systemd[3687]: Stopped target Sockets.
Jan 16 02:18:18 UIG-VM systemd[3687]: Stopped target Timers.
Jan 16 02:18:18 UIG-VM systemd[3687]: dirmngr.socket: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[3687]: Closed GnuPG network certificate management daemon.
Jan 16 02:18:18 UIG-VM systemd[3687]: gpg-agent-browser.socket: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[3687]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jan 16 02:18:18 UIG-VM systemd[3687]: gpg-agent-extra.socket: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[3687]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jan 16 02:18:18 UIG-VM systemd[3687]: gpg-agent-ssh.socket: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[3687]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jan 16 02:18:18 UIG-VM systemd[3687]: gpg-agent.socket: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[3687]: Closed GnuPG cryptographic agent and passphrase cache.
Jan 16 02:18:18 UIG-VM systemd[3687]: Removed slice User Application Slice.
Jan 16 02:18:18 UIG-VM systemd[3687]: Reached target Shutdown.
Jan 16 02:18:18 UIG-VM systemd[3687]: systemd-exit.service: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[3687]: Finished Exit the Session.
Jan 16 02:18:18 UIG-VM systemd[3687]: Reached target Exit the Session.
Jan 16 02:18:18 UIG-VM systemd[1]: user@0.service: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[1]: Stopped User Manager for UID 0.
Jan 16 02:18:18 UIG-VM systemd[1]: Stopping User Runtime Directory /run/user/0...
Jan 16 02:18:18 UIG-VM systemd[1]: run-user-0.mount: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[1]: user-runtime-dir@0.service: Succeeded.
Jan 16 02:18:18 UIG-VM systemd[1]: Stopped User Runtime Directory /run/user/0.
Jan 16 02:18:18 UIG-VM systemd[1]: Removed slice User Slice of UID 0.
Jan 16 02:23:45 UIG-VM kernel: x86/split lock detection: #AC: kvm/1096 took a split_lock trap at address: 0xfffff80462cd7aa3
-- Reboot --

After I discovered the system is down again, below are part of the log from the boot.

Jan 16 12:48:49 UIG-VM kernel: Linux version 5.13.19-2-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100) ()
Jan 16 12:48:49 UIG-VM kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.19-2-pve root=/dev/mapper/pve-root ro quiet
Jan 16 12:48:49 UIG-VM kernel: KERNEL supported cpus:
Jan 16 12:48:49 UIG-VM kernel: Intel GenuineIntel
Jan 16 12:48:49 UIG-VM kernel: AMD AuthenticAMD
Jan 16 12:48:49 UIG-VM kernel: Hygon HygonGenuine
Jan 16 12:48:49 UIG-VM kernel: Centaur CentaurHauls
Jan 16 12:48:49 UIG-VM kernel: zhaoxin Shanghai
Jan 16 12:48:49 UIG-VM kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks


It appears that the random reboot/crash is being caused nu the split lock issue. Is there a patch or may be a work around?

Below are my machine specifications:

Intel NUC11PAHi7
32GB DDR4 SO-DIMM 3200
inland NVME M.s 256GB 3-D NAND
inland SATA III 2.5" 1TB SDD
External thunderbolt 3 1TB spinning drive for backup.
 
Replaced RAM and the NUC itself. Crash is still occurring. Looking at the system log it looks like it might be a different issue now. Below is the log from the latest crash.

Jan 22 15:22:19 UIG-VM pvedaemon[4028]: INFO: starting new backup job: vzdump 100 --remove 0 --node UIG-VM --mode snapshot --compress zstd --storage Backup
Jan 22 15:22:19 UIG-VM pvedaemon[4028]: INFO: Starting Backup of VM 100 (qemu)
Jan 22 15:22:38 UIG-VM pvedaemon[1042]: <root@pam> starting task UPID:UIG-VM:00001012:00016064:61EC91BE:vzdump:101:root@pam:
Jan 22 15:22:38 UIG-VM pvedaemon[4114]: INFO: trying to get global lock - waiting...
Jan 22 15:22:54 UIG-VM systemd[1]: Starting Cleanup of Temporary Directories...
Jan 22 15:22:54 UIG-VM systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Jan 22 15:22:54 UIG-VM systemd[1]: Finished Cleanup of Temporary Directories.
Jan 22 15:23:44 UIG-VM pvedaemon[1041]: <root@pam> successful auth for user 'root@pam'
-- Reboot --
Jan 22 15:29:27 UIG-VM kernel: Linux version 5.15.7-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.7-1 (Tue, 14 Dec 2021 16:42:34 +0100) ()
Jan 22 15:29:27 UIG-VM kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.7-1-pve root=/dev/mapper/pve-root ro quiet
Jan 22 15:29:27 UIG-VM kernel: KERNEL supported cpus:
Jan 22 15:29:27 UIG-VM kernel: Intel GenuineIntel
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!