CPU Lockup on 5.4 Kernel

So reporting back with more data.... 5.3.18-3 is currently at 1d 10h 42m with no issue... Going to reboot into 5.4.30 and test that for a while...
 
Dear all,

With no VM running, I am achieving no issues with uptime. 12:42:39 up 3 days

The issue is only present for me when running a VM in Proxmox.

This is on 5.4 kernel.

Kindly
 
Well, I lasted about 2 days and whoa - machine froze again, the first GPU shut off (windows vm), the 2nd (kali) was still displaying, but totally unresponsive. Total network connection lost towards the host.

And of course - not a single thing in the logs, nada, zip...

Any way of forcing logging when the machine freezes, so at least I'd know what caused it..?
 
Last edited:
Dear all,

I have upgraded to the latest Proxmox 6.2-4 with kernel Linux 5.4.34-1-pve #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200).

Right now it is looking promising and I will let you all know if the issue is persistent. Uptime is 3 hours with VMs running which is good.

Yours
 
So far, after replugging the ram modules, no freeze (yet):
Linux proxmox 5.4.30-1-pve #1 SMP PVE 5.4.30-1 (Fri, 10 Apr 2020 09:12:42 +0200) x86_64 GNU/Linux
22:21:47 up 3 days, 10:28, 1 user, load average: 3.87, 4.20, 4.28
 
Is the network card in your system a realtek r8169 based one?
No, is Intel.

Code:
23:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
24:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
 
Annoyingly, I got another lock up today when on kernel:
Code:
Linux proxmox 5.4.41-1-pve #1 SMP PVE 5.4.41-1 (Fri, 15 May 2020 15:06:08 +0200) x86_64

Text extracted from the photo:
Code:
Welcome to the Proxmox Virtual Environment. Please use your web browser to configure this server - connect to:

https://10.1.1.1:8006/

proxmox login: [ 474981.752276 ] INFO: task btrfs-transacti:20126 blocked for more than 120 seconds.

[ 474981.752293 ] OE Tainted: P 5.4.41-1-pve #1 474981752299 ] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[ 648490.718698 ] watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [pvesr:12782]
[ 648518.718124 ] watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [pvesr:12782]
[ 646546.717556 ] watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [pvesr:12782]
[ 648574.716982 ] watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [pvesr:12782]
[ 648602.716414 ] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [pvesr:12782]
[ 648630.715840 ] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [pvesr:12782]

EDIT: So I'm trying to debug this further - as I know Fedora and CentOS haven't failed me - but I'm looking at it from a hardware perspective first...

I pulled down 'zenstates.py' and inspected my setup - and I noticed the output as:
Code:
root@proxmox:~# zenstates.py -l 
P0 - Enabled - FID = 88 - DID = 8 - VID = 20 - Ratio = 34.00 - vCore = 1.35000
P1 - Enabled - FID = 78 - DID = 8 - VID = 2C - Ratio = 30.00 - vCore = 1.27500
P2 - Enabled - FID = 84 - DID = C - VID = 68 - Ratio = 22.00 - vCore = 0.90000
P3 - Disabled 
P4 - Disabled 
P5 - Disabled 
P6 - Disabled 
P7 - Disabled 
C6 State - Package - Enabled
C6 State - Core - Enabled

All good - but I'm pretty sure "C6 State - Package" is what the PSU workaround disables for non-zero amp power supplies on the 12v rail.

I did a factory reset of the BIOS, then went in and set "Power Supply Idle Control" to "Typical Idle Current" - and the output in zenstates changed. I reapplied an overclock I hadn't used for ages (its low usage that kills things, not high usages!), and now I get:
Code:
root@proxmox:~# zenstates.py -l 
P0 - Enabled - FID = 98 - DID = 8 - VID = 20 - Ratio = 38.00 - vCore = 1.35000 
P1 - Enabled - FID = 88 - DID = 8 - VID = 20 - Ratio = 34.00 - vCore = 1.35000 
P2 - Enabled - FID = 84 - DID = C - VID = 68 - Ratio = 22.00 - vCore = 0.90000 
P3 - Disabled 
P4 - Disabled 
P5 - Disabled 
P6 - Disabled 
P7 - Disabled 
C6 State - Package - Disabled 
C6 State - Core - Enabled

This is what I'd expect to see - so I would assume that something changed in the BIOS that is as I'd expect now.

I'm back to leaving things go for a while now and see what happens. Unless I get further info, I'm going to assume hardware right now.
 
Last edited:
  • Like
Reactions: NessageHostsINC
I actually got rid of the freezes/lockups I was experiencing, right now 14 days uptime @ kernel 5.4.34-1-pve, 4 VMs running (and by 4 VMs I mean 1 Windows vm that I run as a "daily driver", including gaming, and 1 linux VM for pentests, and 2 linux VMs (ubuntu) - web servers & etc.
- Disabled KSM
- Set kernel.hung_task_timeout_secs = 30

I still see an occasional nvme0 queue timeout, but no freezes (yet). I'm almost "afraid" to update to the latest kernel :)
 
Last edited:
I just thought that I'd give some feedback since my last post.

Code:
# uptime
 02:43:41 up 7 days,  9:29,  2 users,  load average: 3.86, 4.37, 11.04

This is a good thing.

A question for the Proxmox folk if any happen to be watching - I noticed kernel.org is at 5.4.44 - but the latest proxmox kernel is 5.4.41.

What is your timeline for updates? In my kernels I build for Xen, I monitor kernel.org and rebuild new packages automatically within 6 hours of release and normally have packages for the kernel in the mirrors within an hour for both CentOS 6, 7 and 8.

What type of integration is there for proxmox?
 
Still good news:
Code:
# uptime
 05:17:47 up 16 days, 12:04,  4 users,  load average: 0.25, 0.24, 0.48
# cat /proc/version 
Linux version 5.4.41-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.41-1 (Fri, 15 May 2020 15:06:08 +0200)
 
I just thought that I'd give some feedback since my last post.

Code:
# uptime
02:43:41 up 7 days,  9:29,  2 users,  load average: 3.86, 4.37, 11.04

This is a good thing.

A question for the Proxmox folk if any happen to be watching - I noticed kernel.org is at 5.4.44 - but the latest proxmox kernel is 5.4.41.

What is your timeline for updates? In my kernels I build for Xen, I monitor kernel.org and rebuild new packages automatically within 6 hours of release and normally have packages for the kernel in the mirrors within an hour for both CentOS 6, 7 and 8.

What type of integration is there for proxmox?

our kernels are not directly based on kernel.org, but on Ubuntu kernel series (usually the latest LTS, with some patches / config changes on top). we monitor both kernel.org and Ubuntu upstreams, but don't automatically build upon each upstream release.
 
  • Like
Reactions: CRCinAU

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!