Massive Problems with Proxmox on Lenovo Flex x240 m5

Nov 28, 2016
102
24
83
Hamburg
Hey guys,

we're running Proxmox (4) on multiple Lenovo Flex x240 m5-nodes with Dual E5 CPU and 10GbE. From time to time multiple nodes just go nuts. Maybe someone can lighten up my thoughts towards any solving idea

Complete Bootlog: https://pastebin.com/raw/Wz2Ky8v8

Is this maybe related to Proxmox beeing installed on a SD-Card-Raid (inside each blade we got 2-SD-Card-Raid for Hypervisor)?
We've already changed the mainboard and the cpu without success.
Code:
[..]
[    3.093290] scsi 0:0:0:0: Direct-Access     ATA      ProxMox          0000 PQ: 0 ANSI: 6
[    3.093948] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    3.094154] sd 0:0:0:0: [sda] 61440000 512-byte logical blocks: (31.5 GB/29.3 GiB)
[    3.094502] sd 0:0:0:0: [sda] Write Protect is off
[    3.094504] sd 0:0:0:0: [sda] Mode Sense: 17 00 00 00
[    3.094829] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    3.106126]  sda: sda1 sda2 sda3
[    3.107578] sd 0:0:0:0: [sda] Attached SCSI disk
[  213.651043] raid6: sse2x1   gen()  7634 MB/s
[  213.719037] raid6: sse2x1   xor()  5984 MB/s
[  213.787039] raid6: sse2x2   gen()  9761 MB/s
[  213.855045] raid6: sse2x2   xor()  6595 MB/s
[  213.923048] raid6: sse2x4   gen() 11291 MB/s
[  213.991054] raid6: sse2x4   xor()  7875 MB/s
[  214.059056] raid6: avx2x1   gen() 15018 MB/s
[  214.127059] raid6: avx2x2   gen() 17438 MB/s
[  214.195062] raid6: avx2x4   gen() 20267 MB/s
[  214.195063] raid6: using algorithm avx2x4 gen() 20267 MB/s
[  214.195064] raid6: using avx2x2 recovery algorithm
[  214.195381] xor: automatically using best checksumming function:
[  214.235064]    avx       : 23388.000 MB/sec
[  214.239702] Btrfs loaded
[  214.984899] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[  240.428423] INFO: task systemd-udevd:249 blocked for more than 120 seconds.
[  240.428649]       Not tainted 4.4.62-1-pve #1
[  240.428790] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.429044] systemd-udevd   D ffff8820352cb928     0   249    242 0x00000004
[  240.429049]  ffff8820352cb928 000000000000000c ffff8810394f0000 ffff8820354d7000
[  240.429052]  ffff8820352cc000 ffff8820352cba88 ffff8820352cba80 ffff8820354d7000
[  240.429054]  ffff8810350b1060 ffff8820352cb940 ffffffff818605b5 7fffffffffffffff
[  240.429056] Call Trace:
[  240.429067]  [<ffffffff818605b5>] schedule+0x35/0x80
[  240.429071]  [<ffffffff81863805>] schedule_timeout+0x235/0x2d0
[  240.429077]  [<ffffffff813fbad2>] ? get_from_free_list+0x42/0x50
[  240.429084]  [<ffffffff810aca79>] ? try_to_wake_up+0x49/0x400
[  240.429086]  [<ffffffff8186102c>] wait_for_completion+0xbc/0x140
[  240.429088]  [<ffffffff810acec0>] ? wake_up_q+0x70/0x70
[  240.429093]  [<ffffffff8109b5a7>] flush_work+0x107/0x190
[  240.429096]  [<ffffffff810981d0>] ? destroy_worker+0x90/0x90
[  240.429098]  [<ffffffff8109b6a5>] work_on_cpu+0x75/0x90
[  240.429100]  [<ffffffff81097d10>] ? move_linked_works+0x90/0x90
[  240.429107]  [<ffffffff8144d620>] ? pci_device_shutdown+0x70/0x70
[  240.429111]  [<ffffffff8144eb3d>] pci_device_probe+0xed/0x140
[  240.429119]  [<ffffffff81565154>] driver_probe_device+0x224/0x4b0
[  240.429121]  [<ffffffff81565464>] __driver_attach+0x84/0x90
[  240.429123]  [<ffffffff815653e0>] ? driver_probe_device+0x4b0/0x4b0
[  240.429126]  [<ffffffff81562ccc>] bus_for_each_dev+0x6c/0xc0
[  240.429128]  [<ffffffff81564abe>] driver_attach+0x1e/0x20
[  240.429130]  [<ffffffff815645f1>] bus_add_driver+0x1f1/0x290
[  240.429132]  [<ffffffffc0092000>] ? 0xffffffffc0092000
[  240.429135]  [<ffffffff81565eb0>] driver_register+0x60/0xe0
[  240.429137]  [<ffffffff8144d01c>] __pci_register_driver+0x4c/0x50
[  240.429148]  [<ffffffffc00920a3>] megasas_init+0xa3/0x1000 [megaraid_sas]
[  240.429154]  [<ffffffff81002143>] do_one_initcall+0xd3/0x200
[  240.429160]  [<ffffffff811edb5a>] ? kmem_cache_alloc_trace+0x19a/0x210
[  240.429167]  [<ffffffff8118dbdc>] do_init_module+0x60/0x1d2
[  240.429170]  [<ffffffff8110a9b4>] load_module+0x2144/0x2630
[  240.429172]  [<ffffffff81106e10>] ? __symbol_put+0x60/0x60
[  240.429176]  [<ffffffff81215387>] ? kernel_read+0x57/0x90
[  240.429178]  [<ffffffff8110b0aa>] SYSC_finit_module+0x9a/0xd0
[  240.429180]  [<ffffffff8110b0fe>] SyS_finit_module+0xe/0x10
[  240.429183]  [<ffffffff818646f6>] entry_SYSCALL_64_fastpath+0x16/0x75
[  245.261202] systemd[1]: systemd 215 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
[  245.261670] systemd[1]: Detected architecture 'x86-64'.
[  245.336545] systemd[1]: Inserted module 'autofs4'
[  245.337366] systemd[1]: Set hostname to <PX1-C1-N09>.
[  245.487634] systemd-sysv-generator[391]: Ignoring creation of an alias umountiscsi.service for itself
[  245.615372] systemd[1]: Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory.
[  245.615700] systemd[1]: Starting Forward Password Requests to Wall Directory Watch.
[  245.615762] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
 
Last edited:
Any today this:

Code:
[  735.167063] Uhhuh. NMI received for unknown reason 2d on CPU 0.
[  735.167255] Do you have a strange power saving mode enabled?
[  735.167437] Dazed and confused, but trying to continue
[  746.545439] Uhhuh. NMI received for unknown reason 3d on CPU 0.
[  746.545644] Do you have a strange power saving mode enabled?
[  746.545827] Dazed and confused, but trying to continue
[  761.778350] Uhhuh. NMI received for unknown reason 2d on CPU 0.
[  761.778532] Do you have a strange power saving mode enabled?
[  761.778715] Dazed and confused, but trying to continue
[  778.653727] Uhhuh. NMI received for unknown reason 3d on CPU 0.
[  778.653928] Do you have a strange power saving mode enabled?
[  778.654098] Dazed and confused, but trying to continue

Message from syslogd@PX1-C1-N09 at Jul 27 07:52:53 ...
 kernel:[  796.945728] Uhhuh. NMI received for unknown reason 2d on CPU 0.

Message from syslogd@PX1-C1-N09 at Jul 27 07:52:53 ...
 kernel:[  796.945916] Do you have a strange power saving mode enabled?

Message from syslogd@PX1-C1-N09 at Jul 27 07:52:53 ...
 kernel:[  796.946097] Dazed and confused, but trying to continue

Message from syslogd@PX1-C1-N09 at Jul 27 07:53:14 ...
 kernel:[  818.028729] Uhhuh. NMI received for unknown reason 2d on CPU 0.

Message from syslogd@PX1-C1-N09 at Jul 27 07:53:14 ...
 kernel:[  818.028916] Do you have a strange power saving mode enabled?

Message from syslogd@PX1-C1-N09 at Jul 27 07:53:14 ...
 kernel:[  818.029098] Dazed and confused, but trying to continue

Message from syslogd@PX1-C1-N09 at Jul 27 07:53:31 ...
 kernel:[  835.038924] Uhhuh. NMI received for unknown reason 3d on CPU 0.

Message from syslogd@PX1-C1-N09 at Jul 27 07:53:31 ...
 kernel:[  835.039112] Do you have a strange power saving mode enabled?

Message from syslogd@PX1-C1-N09 at Jul 27 07:53:31 ...
 kernel:[  835.039293] Dazed and confused, but trying to continue