Proxmox VE5 on ProLiant G5 Server CPU#0 stuck for 23s

einmalacht

New Member
Oct 24, 2015
9
1
3
Hello together,
I am using Proxmox for a long time on some older HP Servers.
But when i upgrade from VE 4 to VE 5 the system wont boot and bring the following message:

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s!

When i boot the old Kernel from VE 4.4 the system will boot without problems.
I disabeled nmi watchtdog in Grub so that cat /proc/sys/kernel/nmi_watchdog is 0
but it dont solve the problem.

When i install a fresh Proxmox VE5 i have the same problem after installation .
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s!

I can only boot the new VE 5 from install image when i switch acpi=off
otherwise i will become a black screen.

Can someone help me ?
 
Hmm, yes G5 is really really old, so maybe Debian9 is to new to the server. So what can you do:
  • Install the latest firmware for this generation. Also the HDD firmware, it is not included in SPP.
  • Disabel every NMI, HP Watchdog Feature in BIOS
  • Set the BIOS to Static High Power
  • Stay on PVE4
 
  • Like
Reactions: GadgetPig
@einmalacht: did you find a better solution? acpi=off works but then I'm not able to shutdown my server (stays with System halted but doesn't power down)
 
Update from my side: I found a solution but I'm not able to link to the Debian forum entry since I'm new here.
Conclusio: BIOS -> MPS Table Mode -> Disabled (not the default value). Then PVE5 works without ACPI=OFF
 
  • Like
Reactions: mannebk
I tried the kernel.watchdog line but it didn't work for me, still same error...
 
Does it still occur when you power off completely and hard reboot? Also I found a link that affected HP DL360 G8 servers. Perhaps it could help:

https://unix.stackexchange.com/questions/354368/nmi-watchdog-bug-soft-lockup

I would try replacing any cmos/BBU batteries (be sure to backup your VMs/data first), and/or try a different power supply.

from the link above:

Check /etc/default/grub for lines that look like:

GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=9600"
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,9600"

If you need to maintain a serial console, change 9600 to 115200:

GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200"

(double check this command applies and safe to proxmox) Then reinstall grub with grub2-mkconfig -o /boot/grub2/grub.cfg

If you don't need a serial console, you can remove GRUB_SERIAL_COMMAND and update the other two lines and reinstall grub:

GRUB_TERMINAL="console"
GRUB_CMDLINE_LINUX="console=tty0"
 
Last edited:
Yeah it happens even after a complete power off and hard reboot. Replacing CMOS battery won't help, it happens with all 3 of my test lab servers.
Changing grub won't help either because serial wasn't included in my config.

But as I stated in post #4, everything is working with the altered option in BIOS. I still don't notice any downside.
 
  • Like
Reactions: GadgetPig
I just hit this after upgrading to Proxmox VE5 on a DL380 G4 server. The issue is not specific to Proxmox - I found mentions of similar issues on forums with kernel 4.9 as well as 4.10 with these generations of DL380 (in fact if I try boot the Debian Stretch installer on the same hardware it hits the same issue).

I believe the problem is related to an interaction with kernels 4.9/4.10 and hyperthreading on this hardware - acpi=off with disable hyperthreading I believe, and "MPS Table Mode -> Disabled" will also do the same (as well as disabling multiple processors) hence why it works around the problem.

cybermcm - if your server is multiprocessor then I believe you'll find that setting "MPS Table Mode -> Disabled" will leave you with only one processor running (in addition to no hyperthreading). If your server is single processor that's not really an issue, but if it's multiple processor then that will rather impact performance!

On the DL380 G4, and I'd imagine on the G5 as well, you can actually disable hyperthreading from the bios (it's also under advanced options) - I believe that's a better option than setting acpi to off and certainly better than disabling MPS table mode.

So if you disable hyperthreading in the bios then my experience is the server will boot and run ok with kernels 4.9/4.10 with acpi left enabled (i.e. you don't need the acpi=off option, and you certainly don't need to disable MPS Table Mode). So that leaves acpi enabled (allowing power down etc), gives you multi-processor (if you have more than one), but just doesn't have hyperthreading (so you won't see as many logical processors as you would have done under kernel 4.8 of course).

The threads I read elsewhere about the same lockup seemed to suggest that kernel 4.11 doesn't exhibit the same issue, so perhaps the same will be true of 4.12 (I haven't tested either myself). So if a later minor version of Proxmox VE5 eventually moves to kernel 4.12 it's probably worth retesting with the hardware with hyperthreading enabled to see if the issue has gone away.
 
Skaffen: Thank you for your input and you are right. I've two processors and now only one is working. I didn't noticed this because there are only low CPU performance VMs running. So this is an issue :-(
I searched within BIOS options but it seems that there is no option for hyperthreading with the affected server. My second lab host has different processors, there is an option for shutting down half cores (I think this means hyperthreading).
Today I tested a blank Debian 9.1 with my server, same problem, booting not possible. I upgraded to kernel 4.12 and enabled MPS Table Mode -> Booting!

so for now I've to wait that Proxmox updates its kernel version...
 
Last edited:
Update: Proxmox VE 5.1 installed and now all is working again (with MPS Table Mode enabled) with the new Linux kernel...
Thank you Proxmox Team for upgrading!
 
  • Like
Reactions: GadgetPig
Hello together .. i can confirm that Installation of Proxmox 5.1 is running now .
The CPU stuck error is gone.

But i cannot start a KVM Machine ...

Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or Directory

when i will load the Intel_kvm module:
modprobe: ERROR: could not insert 'kvm_intel': Input/output error

with Proxmox 4.4 all is good ....
 
  • Like
Reactions: macleod
I did an inplace upgrade and my VMs are working fine right now. Did you do a fresh install or an upgrade?
 
Hello together .. i can confirm that Installation of Proxmox 5.1 is running now .
The CPU stuck error is gone.

But i cannot start a KVM Machine ...

Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or Directory

when i will load the Intel_kvm module:
modprobe: ERROR: could not insert 'kvm_intel': Input/output error

with Proxmox 4.4 all is good ....

Also my problem too, with 4.10 kernel worked almost fine with acpi=off (except the reboot issue).
But the new 4.13 kernel seems to be lacking the kvm support for those processors, same errors encountered (after apt dist-upgrade to proxmox 5.1)
Booted for now with the latest old 4.10 kernel, everything returned to normal.
 
My G5 upgrade PVE5->PVE5.1 works out of box.
Code:
root@pve# lsmod | grep kv
kvm_intel             200704  0
kvm                   581632  1 kvm_intel
irqbypass              16384  1 kvm
 
maybe it is related to the CPU?
my error msg
Code:
Oct 25 11:48:52 host02 pvedaemon[3371]: start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=ea5be438-ee91-4750-91e2-21a69337c471' -name test01 -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 1024 -k de -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:65d12d6d6210' -drive 'file=/var/lib/vz/template/iso/debian-9.2.1-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/h02R5/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=06:DE:0A:EB:8F:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1

Oct 25 11:48:52 host02 pvedaemon[990]: <root@pam> end task UPID:host02:00000D2B:00022BDC:59F05E04:qmstart:100:root@pam: start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=ea5be438-ee91-4750-91e2-21a69337c471' -name test01 -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 1024 -k de -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:65d12d6d6210' -drive 'file=/var/lib/vz/template/iso/debian-9.2.1-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/h02R5/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=06:DE:0A:EB:8F:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!