Kernel panics with 4.2

anigwei

Member
Nov 25, 2015
33
1
8
Barcelona (Catalunya)
about.me
Hi!

I'm deploying a new server (Intel S2600WT2R) and Proxmox 4.2. It has a Hardware (megaraid) RAID1 of 2.8Tb.

After doing intensive I/O (dumping initial VMs) I've seen some strange kernel panics and I don't know what is the cause. Any ideas?

Thank you!!

Entire log: http://pastebin.com/PLKnEABP

19.124998] vmbr0: port 1(eth0) entered forwarding state
[ 19.125007] vmbr0: port 1(eth0) entered forwarding state
[ 19.125147] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready
[ 20.492941] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 20.524592] ip_set: protocol 6
[ 98.459628] device tap400i0 entered promiscuous mode
[ 98.464789] vmbr0: port 2(tap400i0) entered forwarding state
[ 98.464797] vmbr0: port 2(tap400i0) entered forwarding state
[ 101.937493] kvm: zapping shadow pages for mmio generation wraparound
[ 101.942927] kvm: zapping shadow pages for mmio generation wraparound
[ 106.693088] kvm [3151]: vcpu0 unhandled rdmsr: 0x570
[ 106.693248] kvm [3151]: vcpu1 unhandled rdmsr: 0x570
[ 106.693396] kvm [3151]: vcpu2 unhandled rdmsr: 0x570
[ 106.693493] kvm [3151]: vcpu3 unhandled rdmsr: 0x570
[ 106.693623] kvm [3151]: vcpu4 unhandled rdmsr: 0x570
[ 106.693788] kvm [3151]: vcpu5 unhandled rdmsr: 0x570
[ 480.529253] INFO: task lvs:3298 blocked for more than 120 seconds.
[ 480.529350] Tainted: P O 4.4.8-1-pve #1
[ 480.529440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 480.529564] lvs D ffff8807c13c79f8 0 3298 2461 0x00000000
[ 480.529569] ffff8807c13c79f8 ffff880466ffe800 ffff88046c3b44c0 ffff88086589a940
[ 480.529572] ffff8807c13c8000 ffff88046e797180 7fffffffffffffff ffff88086589a940
[ 480.529573] ffff88045d8f0500 ffff8807c13c7a10 ffffffff818448a5 0000000000000000
[ 480.529575] Call Trace:
[ 480.529585] [<ffffffff818448a5>] schedule+0x35/0x80
[ 480.529588] [<ffffffff81847ae5>] schedule_timeout+0x235/0x2d0
[ 480.529593] [<ffffffff813bc570>] ? generic_make_request+0x110/0x1f0
[ 480.529595] [<ffffffff81843dbb>] io_schedule_timeout+0xbb/0x140
[ 480.529600] [<ffffffff8124c7bc>] do_blockdev_direct_IO+0x1b1c/0x2be0
[ 480.529606] [<ffffffff813cd001>] ? exact_lock+0x11/0x20
[ 480.529609] [<ffffffff81247640>] ? I_BDEV+0x20/0x20
[ 480.529611] [<ffffffff8124d8c3>] __blockdev_direct_IO+0x43/0x50
[ 480.529613] [<ffffffff81247d18>] blkdev_direct_IO+0x58/0x80
[ 480.529616] [<ffffffff8118ebdf>] generic_file_read_iter+0x46f/0x5c0
[ 480.529618] [<ffffffff812480e7>] blkdev_read_iter+0x37/0x40
[ 480.529623] [<ffffffff8120bfe4>] new_sync_read+0x94/0xd0
[ 480.529624] [<ffffffff8120c046>] __vfs_read+0x26/0x40
[ 480.529626] [<ffffffff8120c686>] vfs_read+0x86/0x130
[ 480.529629] [<ffffffff8120d4f5>] SyS_read+0x55/0xc0
[ 480.529631] [<ffffffff818489b6>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 548.495487] device tap100i0 entered promiscuous mode
 

anigwei

Member
Nov 25, 2015
33
1
8
Barcelona (Catalunya)
about.me
Hi,

Something strange is happening related to storage...

A VM also gets that kind of panics! Related to jbd2/vda2-8 (Storage?).

[ 230.361136] sched: RT throttling activated
[ 360.048079] INFO: task jbd2/vda2-8:147 blocked for more than 120 seconds.
[ 360.048151] Not tainted 3.19.0-49-generic #55~14.04.1-Ubuntu
[ 360.048194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.048242] jbd2/vda2-8 D ffff880036aa3b88 0 147 2 0x00000000
[ 360.048249] ffff880036aa3b88 ffff8800b8f613a0 0000000000013e80 ffff880036aa3fd8
[ 360.048251] 0000000000013e80 ffff880139b5a740 ffff8800b8f613a0 ffff880036aa3c30
[ 360.048253] ffff88013fc14778 ffff880036aa3c30 ffff88013ffc2898 0000000000000002
[ 360.048255] Call Trace:
[ 360.048287] [<ffffffff817b4380>] ? bit_wait+0x50/0x50
[ 360.048289] [<ffffffff817b3b50>] io_schedule+0xa0/0x130
[ 360.048291] [<ffffffff817b43ac>] bit_wait_io+0x2c/0x50
[ 360.048292] [<ffffffff817b3fe5>] __wait_on_bit+0x65/0x90
[ 360.048294] [<ffffffff817b4380>] ? bit_wait+0x50/0x50
[ 360.048296] [<ffffffff817b4082>] out_of_line_wait_on_bit+0x72/0x80
[ 360.048310] [<ffffffff810b4fa0>] ? autoremove_wake_function+0x40/0x40
[ 360.048320] [<ffffffff81220156>] __wait_on_buffer+0x36/0x40
[ 360.048326] [<ffffffff812bafbf>] jbd2_journal_commit_transaction+0x183f/0x1a80
[ 360.048331] [<ffffffff810dbeef>] ? try_to_del_timer_sync+0x4f/0x70
[ 360.048334] [<ffffffff812beb5b>] kjournald2+0xbb/0x240
[ 360.048336] [<ffffffff810b4f60>] ? prepare_to_wait_event+0x110/0x110
[ 360.048337] [<ffffffff812beaa0>] ? commit_timeout+0x10/0x10
[ 360.048344] [<ffffffff810938d2>] kthread+0xd2/0xf0
[ 360.048346] [<ffffffff81093800>] ? kthread_create_on_node+0x1c0/0x1c0
[ 360.048350] [<ffffffff817b7b58>] ret_from_fork+0x58/0x90
[ 360.048352] [<ffffffff81093800>] ? kthread_create_on_node+0x1c0/0x1c0
[ 360.048376] INFO: task java:1422 blocked for more than 120 seconds.
[ 360.048417] Not tainted 3.19.0-49-generic #55~14.04.1-Ubuntu
[ 360.048456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
6,720
1,173
164
those are not kernel panics, but the kernel tells you that a kernel task has hung (waited) for more than 2 minutes. usual this indicates either a deadlock in kernel space (which would be a bug) or severe I/O congestion (your case). if your host I/O is so congested that it comes to a crawl, it's only natural that VMs that use the same storage experience the same congestion ;)
 

anigwei

Member
Nov 25, 2015
33
1
8
Barcelona (Catalunya)
about.me
those are not kernel panics, but the kernel tells you that a kernel task has hung (waited) for more than 2 minutes. usual this indicates either a deadlock in kernel space (which would be a bug) or severe I/O congestion (your case). if your host I/O is so congested that it comes to a crawl, it's only natural that VMs that use the same storage experience the same congestion ;)

Hi Fabian,

Thanks for your answer.

I was thinking of I/O congestion too.... but I have other servers (some older) and I've never seen I/O problems while transferring VM into them.

This is a brand new Intel S2600WT2R with a LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (Intel branded) and a RAID1 of SATA disks.

While transferring that VMS via NFS (and while that messages appeared), an IOSTAT showed me about 80Mb/s speed.

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!