kernel:BUG: soft lockup - CPU#6 stuck for 67s!

ictdude

Renowned Member
May 18, 2008
88
0
71
I am running Proxmox V 3.1-12/93bf03d4

And lately i get error:

Message from syslogd: kernel:BUG: soft lockup - CPU#6 stuck for 67s! [kswapd0:140]

At that moment i cant login to the Gui of proxmox. And need to do a reboot.
What can be the problem ? And how to diagnose this ?

Any Advise ? Is this software related ?
 
sometimes we see this problem and connected this to problem with cheap network adapters, which have only two msix lines.
 
I saw this only happening on lxc nodes (completly taking them down)

From the logs, i only saw:
Code:
Feb  6 19:50:50 dx411-s11 kernel: [564033.729200]  0000000000000286 00000000b1327e45 ffff8829c95e3c90 ffffffff813f9523
Feb  6 19:50:50 dx411-s11 kernel: [564033.729208]  ffff8829c95e3cc8 ffffffff81191ffb ffff882f2c0c5400 ffff882f2c0c5400
Feb  6 19:50:50 dx411-s11 kernel: [564033.729227]  [<ffffffff813f9523>] dump_stack+0x63/0x90
Feb  6 19:50:50 dx411-s11 kernel: [564033.729240]  [<ffffffff81191ffb>] ? find_lock_task_mm+0x3b/0x80
Feb  6 19:50:50 dx411-s11 kernel: [564033.729249]  [<ffffffff811fe78f>] ? mem_cgroup_iter+0x1cf/0x380
Feb  6 19:50:50 dx411-s11 kernel: [564033.729258]  [<ffffffff812014f7>] mem_cgroup_oom_synchronize+0x347/0x360
Feb  6 19:50:50 dx411-s11 kernel: [564033.729267]  [<ffffffff81192cc4>] pagefault_out_of_memory+0x44/0xc0
Feb  6 19:50:50 dx411-s11 kernel: [564033.729276]  [<ffffffff8106b723>] __do_page_fault+0x3e3/0x410
Feb  6 19:50:50 dx411-s11 kernel: [564033.729286]  [<ffffffff8185e3f8>] page_fault+0x28/0x30
Feb  6 19:50:50 dx411-s11 kernel: [564033.729297] memory: usage 1047064kB, limit 1048576kB, failcnt 763346
Feb  6 19:50:50 dx411-s11 kernel: [564033.729302] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Feb  6 19:50:50 dx411-s11 kernel: [564033.729337] Memory cgroup stats for /lxc/3493/user.slice: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Feb  6 19:50:50 dx411-s11 kernel: [564033.729393] Memory cgroup stats for /lxc/3493/user.slice/user-0.slice/session-c173.scope: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Feb  6 19:50:50 dx411-s11 kernel: [564033.729456] Memory cgroup stats for /lxc/3493/user.slice/user-0.slice/session-c176.scope: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Feb  6 19:50:50 dx411-s11 kernel: [564033.729513] Memory cgroup stats for /lxc/3493/user.slice/user-0.slice/session-c327.scope: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB

(repeating many times)

And soon after

Code:
Feb  6 19:56:29 dx411-s11 kernel: [564373.050767] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s! [bash:27082]
Feb  6 19:56:29 dx411-s11 kernel: [564373.050835] Modules linked in: xt_recent tcp_diag inet_diag nfnetlink_queue bluetooth dm_snapshot ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG xt_limit iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security$
Feb  6 19:56:29 dx411-s11 kernel: [564373.050941] CPU: 31 PID: 27082 Comm: bash Tainted: P  O L  4.4.35-2-pve #1
Feb  6 19:56:29 dx411-s11 kernel: [564373.050943] Hardware name: Dell Inc. PowerEdge R420/0K29HN, BIOS 2.4.2 01/29/2015
Feb  6 19:56:29 dx411-s11 kernel: [564373.050945] task: ffff882c4a470e00 ti: ffff88300f22c000 task.ti: ffff88300f22c000
Feb  6 19:56:29 dx411-s11 kernel: [564373.050946] RIP: 0010:[<ffffffff811a1e26>]  [<ffffffff811a1e26>] get_lru_size+0x16/0x40
Feb  6 19:56:29 dx411-s11 kernel: [564373.050956] RSP: 0000:ffff88300f22f748  EFLAGS: 00000282
Feb  6 19:56:29 dx411-s11 kernel: [564373.050957] RAX: 0000000000000000 RBX: ffff88300f22f968 RCX: 0000000000000000
Feb  6 19:56:29 dx411-s11 kernel: [564373.050958] RDX: ffff88187fffbf80 RSI: 0000000000000001 RDI: ffff88153d5be340
Feb  6 19:56:29 dx411-s11 kernel: [564373.050959] RBP: ffff88300f22f850 R08: 0000000000000000 R09: 0000000000000000
Feb  6 19:56:29 dx411-s11 kernel: [564373.050960] R10: 000000000231571c R11: 0000000000000333 R12: 0000000000000000
Feb  6 19:56:29 dx411-s11 kernel: [564373.050961] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88300f22f888
Feb  6 19:56:29 dx411-s11 kernel: [564373.050963] FS:  00007fb6b5eb7700(0000) GS:ffff88301f3c0000(0000) knlGS:0000000000000000
Feb  6 19:56:29 dx411-s11 kernel: [564373.050964] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb  6 19:56:29 dx411-s11 kernel: [564373.050966] CR2: 0000000000499050 CR3: 0000002c900e4000 CR4: 00000000001426e0
Feb  6 19:56:29 dx411-s11 kernel: [564373.050967] Stack:
Feb  6 19:56:29 dx411-s11 kernel: [564373.050969]  ffffffff811a5a3f ffff882dbbda0400 0000000000000000 0000000000000020
Feb  6 19:56:29 dx411-s11 kernel: [564373.050971]  ffff882dbbb41400 ffff882c00000003 ffff882f12fe0001 ffff882c00000000
Feb  6 19:56:29 dx411-s11 kernel: [564373.050973]  ffff88153d5be340 0000000000000000 0000000000000000 0000000000000000
Feb  6 19:56:29 dx411-s11 kernel: [564373.050975] Call Trace:
Feb  6 19:56:29 dx411-s11 kernel: [564373.050980]  [<ffffffff811a5a3f>] ? shrink_lruvec+0x12f/0x750
Feb  6 19:56:29 dx411-s11 kernel: [564373.050987]  [<ffffffff81116d56>] ? css_next_descendant_pre+0x46/0x60
Feb  6 19:56:29 dx411-s11 kernel: [564373.050992]  [<ffffffff811fe78f>] ? mem_cgroup_iter+0x1cf/0x380
Feb  6 19:56:29 dx411-s11 kernel: [564373.050995]  [<ffffffff811a614b>] shrink_zone+0xeb/0x2d0
Feb  6 19:56:29 dx411-s11 kernel: [564373.050997]  [<ffffffff811a64b3>] do_try_to_free_pages+0x183/0x480
Feb  6 19:56:29 dx411-s11 kernel: [564373.051000]  [<ffffffff811a69f4>] try_to_free_mem_cgroup_pages+0xc4/0x1a0
Feb  6 19:56:29 dx411-s11 kernel: [564373.051003]  [<ffffffff811fdd26>] try_charge+0x1a6/0x680
Feb  6 19:56:29 dx411-s11 kernel: [564373.051007]  [<ffffffff81201fdc>] mem_cgroup_try_charge+0x9c/0x1b0
Feb  6 19:56:29 dx411-s11 kernel: [564373.051010]  [<ffffffff8118f480>] __add_to_page_cache_locked+0x60/0x1f0
Feb  6 19:56:29 dx411-s11 kernel: [564373.051012]  [<ffffffff8118f667>] add_to_page_cache_lru+0x37/0x90
Feb  6 19:56:29 dx411-s11 kernel: [564373.051016]  [<ffffffff812e88a4>] ext4_mpage_readpages+0x184/0x920
Feb  6 19:56:29 dx411-s11 kernel: [564373.051022]  [<ffffffff811e1f72>] ? alloc_pages_current+0x92/0x120
Feb  6 19:56:29 dx411-s11 kernel: [564373.051027]  [<ffffffff8129aef6>] ext4_readpages+0x36/0x40
Feb  6 19:56:29 dx411-s11 kernel: [564373.051032]  [<ffffffff8119d757>] __do_page_cache_readahead+0x197/0x230
Feb  6 19:56:29 dx411-s11 kernel: [564373.051034]  [<ffffffff8118f6ed>] ? pagecache_get_page+0x2d/0x1b0
Feb  6 19:56:29 dx411-s11 kernel: [564373.051036]  [<ffffffff811912a0>] filemap_fault+0x360/0x3e0
Feb  6 19:56:29 dx411-s11 kernel: [564373.051040]  [<ffffffff811ccc41>] ? page_add_file_rmap+0x51/0x60
Feb  6 19:56:29 dx411-s11 kernel: [564373.051043]  [<ffffffff812a4066>] ext4_filemap_fault+0x36/0x50
Feb  6 19:56:29 dx411-s11 kernel: [564373.051048]  [<ffffffff811bda10>] __do_fault+0x50/0xe0
Feb  6 19:56:29 dx411-s11 kernel: [564373.051050]  [<ffffffff811c25e3>] handle_mm_fault+0x10f3/0x19c0
Feb  6 19:56:29 dx411-s11 kernel: [564373.051055]  [<ffffffff8106b772>] ? do_page_fault+0x22/0x30
Feb  6 19:56:29 dx411-s11 kernel: [564373.051060]  [<ffffffff8185e3f8>] ? page_fault+0x28/0x30
Feb  6 19:56:29 dx411-s11 kernel: [564373.051062]  [<ffffffff8106b4dd>] __do_page_fault+0x19d/0x410
Feb  6 19:56:29 dx411-s11 kernel: [564373.051067]  [<ffffffff81003885>] ? syscall_trace_enter_phase1+0xc5/0x140
Feb  6 19:56:29 dx411-s11 kernel: [564373.051069]  [<ffffffff8106b772>] do_page_fault+0x22/0x30
Feb  6 19:56:29 dx411-s11 kernel: [564373.051072]  [<ffffffff8185e3f8>] page_fault+0x28/0x30
 
please don't hijack unrelated old threads.. open a new one, with the complete logs and "pveversion -v" output included