BUG: soft lockup - CPU#5 stuck for 22s! [kvm:30536]

chs_voks

New Member
Sep 21, 2016
9
0
1
Hello. Last week one of the cluster nodes hangs. When I connected to the console I saw the following:

i688^cimgpsh_orig.png

Code:
proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
openvswitch-switch: 2.6.0-2

Code:
Linux fra-pve01 4.4.35-1-pve #1 SMP Fri Dec 9 11:09:55 CET 2016 x86_64 GNU/Linux

Proxmox 4.4 works on IBM Flex System x220 Compute Node. My cluster consists of 4 identical blades. Is it problem with PVE, kernel or hardware?
 
A soft lockup is the symptom of a task or kernel thread using and not releasing a CPU for a longer than allowed. It is a warning, not an error message.

Is the VM with the process ID
30536
running unaffected ?
 
A soft lockup is the symptom of a task or kernel thread using and not releasing a CPU for a longer than allowed. It is a warning, not an error message.

Is the VM with the process ID
30536
running unaffected ?

I had to manually move VMs to another node after that they started successfully.
 
a bit of shooting in the dark as this message can have many many root causes, did you disable the power saving options in your server BIOS ?
 
@chs_voks,

Were you able to get anywhere with this? I am seeing this same issue every 1-2 weeks on one of my nodes and having trouble pinpointing a root cause.

If not, anyone else have any information?