Since few days ago, daily, i'm getting such errors. Once they start, in few minutes, the whole node crashes. I cannot run any commands, i was logged from both idrac console and ssh. The only solution so far is a reboot (but this affects my uptime a lot)
According to bios, watchdog is disabled (dell r420)
Code:
Message from syslogd@dx411-s09 at Feb 12 16:16:51 ...
kernel:[214551.571665] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [queueprocd - pr:45363]
^C^C^C^C^C
Message from syslogd@dx411-s09 at Feb 12 16:18:51 ...
kernel:[214671.566741] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [queueprocd - pr:45363]
^C^C^C^C^X^Z^Z
Message from syslogd@dx411-s09 at Feb 12 16:19:19 ...
kernel:[214699.565592] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [queueprocd - pr:45363]
Message from syslogd@dx411-s09 at Feb 12 16:19:47 ...
kernel:[214727.564445] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [queueprocd - pr:45363]
Message from syslogd@dx411-s09 at Feb 12 16:20:55 ...
kernel:[214795.561655] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [queueprocd - pr:45363]
Code:
pveversion -v
proxmox-ve: 4.4-79 (running kernel: 4.4.35-2-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-108
pve-firmware: 1.1-10
libpve-common-perl: 4.0-91
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-73
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-1
pve-container: 1.0-93
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-1
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve14~bpo80
According to bios, watchdog is disabled (dell r420)