PVE 5.0 - locks/not responding

sorenb

Member
Jul 23, 2017
7
3
8
120
Installed PVE 5.0 i5/750, 32GB RAM, Asrock P55M PRO, HPE 331T additional network card.
Systems locks/are not responding within 8hours.

Loaded BIOS defaults, and did the memory test for about 24hours -> no errors.
Boot'ed the PVE, and again within 2 hours the system got stucked (no VMs running, only PVE itself)

INFO: rcu_sched detected stalls on CPU/tasks:
#2-...: (3 GPs behind) idle=c37/1/0 softirq=45621/45622 fqs=2864582
#detected by 0, t=5730637 Jiffies, g=57433, c=57432, q=466908
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [worker/3:0:12288]
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [worker/3:0:12199]
repeated x12, then rinse and repeat
Only difference is whether 12288 or 12199 appears first, still alternates between those two threads

Have tried to run the PVE on two different SSD's; Intel and Kingston drives ... (cleaned those with Diskpart)

Tried to follow the various advices at https://forum.proxmox.com/threads/soft-lockup-cpu.25575/
Without much luck.

Any ideas/suggestions are welcome ...
 

vooze

Member
May 11, 2017
77
20
8
33
Thats, the problem :( The open source nvidia driver sucks (Nvidias fault) you need to blacklist that shit. You will get a lower resolution (who cares, this is a server :D), BUT it works, trust me.

- nano /etc/modprobe.d/blacklist-nouveau.conf

paste this:

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

- echo options nouveau modeset=0 | tee -a /etc/modprobe.d/nouveau-kms.conf
- update-initramfs -u
- reboot.

Problem goes away :)

I have the same GPU as you.

I suspect the problem started happening with Debian 9, though I have seen other people on this forum on debian 8 (proxmox 4.4) with that problem. They suggested to install the nvidia driver (bad idea.. breaks alot of shit, and pulls X11 etc. etc.) this is the better solution.
 

sorenb

Member
Jul 23, 2017
7
3
8
120
Thats, the problem :( The open source nvidia driver sucks (Nvidias fault) you need to blacklist that shit. You will get a lower resolution (who cares, this is a server :D), BUT it works, trust me.
...so far so good, applied the change right after you posted it, and it is still humming along, and nothing in the logs, so it seems good ;o)

27th of June ... PVE still humming ;o)
 
Last edited:
  • Like
Reactions: chrone

xtekrepair

Member
May 16, 2015
2
0
21
Thank you so much for this I just implemented this fix have a GT 710 chip in a Dual Xeon E5-2640V4 128GB of RAM and 6TB of storage.
Very worried that there was a serious flaw here. Will keep you posted to see if this fixes the problem, running a CPU torture test and everytime I have done that before the server would not last 8 hours.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!