Upgraded to 5.0 headaches

ImageJPEG

New Member
Jul 14, 2017
6
0
1
33
So I just upgraded to 5.0-23 tonight.

When I try to boot the system, I now receive this error:

[28.072000] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]

The machine will keep spitting out variations of that error (only thing that changes is the timing. Instead of 23s it might be 23s). It's just stuck here, not able to do anything. It's not able to continue booting.

I do have the machine working for the time being but definitely don't want to keep it like this. I'm currently having it booting to the 4.4.67-1 kernel instead of the 4.10 one.

I tried adding "nmi_watchdog=0" in my grub.cfg file with no luck as well :/
 
Hi,
I've been having this issue since Proxmox 4.4 and it still persists.
Some researching lead to an LXC issue when the allocated thread/s are exceeded and the kernel simply locks up with the NMI watchdog error : LXC user PID - CPU#$ stuck for xxs! only to finally freeze completely requiring a reboot to recover.

Long story short: how can I set limits.cpu.allowance 200ms (or some other value) in proxmox?

Already tried with pct and manual edit in /etc/pve/lxc/lxcID.conf and failed.
 
I am not running ZFS either. Plain ext4 used on all my nodes.
I'm fairly certain it has to do with users exceeding CPU usage with miners or other intensive tasks.

How can I set limits.cpu.allowance 200ms (or some other value) in proxmox?
 
I'm not even mining. The only container I'm using is with Ubuntu 16.04 LTS running my Unifi Controller. The rest of my vms are kvm freebsd/openbsd services.
 
Installed PVE 5.0
created a KVM based on 2.4kernel, 1socket/2cores/2048MB, using Vortexbox 2.4 iso, next,next....
installed cifs-utils in the Vortexbox and mounted two CIFS in the KVM to a NAS.
Start playing files from the NAS.
After 2-4 hours the system locks.
NMI Watchdog soft locks ...
and the hole system is locked. Nothing but the Vortexbox is running.
Are there something with the Vortexbox that triggers the PVE to lock up like this?
 
did the memory test for about 24hours -> no errors.
Boot'ed the PVE, and within 8 hours the system got stucked (no VMs running, only PVE itself)

INFO: rcu_sched detected stalls on CPU/tasks:
#2-...: (3 GPs behind) idle=c37/1/0 softirq=45621/45622 fqs=2864582
#detected by 0, t=5730637 Jiffies, g=57433, c=57432, q=466908
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [worker/3:0:12288]
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [worker/3:0:12199]
repeated x12, then rinse and repeat
Only differece is whether 12288 or 12199 appears first, still alternates between those two threads


Have tried to run the PVE on two different SSD's; Intel and Kingston drives ...

Tried to follow the various advices at https://forum.proxmox.com/threads/soft-lockup-cpu.25575/


Any ideas are welcome ...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!