PVE 4.0 Beta 2: Difficulties with zfs {Large kmem_alloc}

windinternet

Member
Oct 8, 2015
159
14
18
Hello all,

We have some intermittent crashes on a PVE 4.0 Beta 2 server. Server loses connectivity and kernel log shows a message that is correlated in time. We run ZFS with a raid10 setup done with the installer.

This message shows up in kern.log:

Large kmem_alloc(35496, 0x1000), please file an issue at:
CPU: 3 PID: 251 Comm: zvol Tainted: P O 4.2.0-1-pve #1
Hardware name: Supermicro X10SL7-F/X10SL7-F, BIOS 2.00 04/24/2014
0000000000000000 ffff8807ef8abbd8 ffffffff817c76d3 ffff88081fcd1178
000000000000c210 ffff8807ef8abc28 ffffffffc00d6c39 0000000000000017
00000000f7f59a98 ffff8807ef8abc18 ffff8807faaa6b88 00000005f4adf000
Call Trace:
[<ffffffff817c76d3>] dump_stack+0x45/0x57
[<ffffffffc00d6c39>] spl_kmem_zalloc+0x159/0x1c0 [spl]
[<ffffffffc01b2df1>] dmu_buf_hold_array_by_dnode+0xa1/0x4b0 [zfs]
[<ffffffffc01b32dd>] dmu_buf_hold_array+0x5d/0x80 [zfs]
[<ffffffffc01b4585>] dmu_write_req+0x65/0x1c0 [zfs]
[<ffffffffc0254ea7>] zvol_write+0x117/0x430 [zfs]
[<ffffffffc00da2b1>] taskq_thread+0x221/0x450 [spl]
[<ffffffff8109f530>] ? wake_up_q+0x70/0x70
[<ffffffffc00da090>] ? taskq_cancel_id+0x110/0x110 [spl]
[<ffffffff810947ab>] kthread+0xdb/0x100
[<ffffffff810946d0>] ? kthread_create_on_node+0x1c0/0x1c0
[<ffffffff817ce55f>] ret_from_fork+0x3f/0x70
[<ffffffff810946d0>] ? kthread_create_on_node+0x1c0/0x1c0

It seems as though this particular error may have been fixed upstream in ZFS on Linux at 28th august with a fix for issue 3684.

Did anyone else see this error and know how to handle it?
 
update to stable 4.0 (including zfs updates) and try again.
 
Hello Tom,

Thanks for the reply. Could you confirm that this commit made it in into PVE 4 release?

Dietmar Maurer [Thu, 24 Sep 2015 10:47:22 +0000]
update pkg-zfs to master/debian/jessie/0.6.5.1-4

Best regards,
Gerrit Venema
WIND Internet