Kernel Panic (General Protection Fault) on Proxmox 4.0 with ZFS

tino.l

Member
Nov 3, 2015
5
0
21
Hello,

two of our newer hosts with PVE 4.0 encountered a kernel panic under high load. Here is a log excerpt:

Code:
Oct 30 03:00:11 host kernel: [925383.376718] general protection fault: 0000 [#1] SMP 
Oct 30 03:00:11 host kernel: [925383.377299] Modules linked in: act_police cls_u32 sch_ingress sch_htb ip_set nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libis csi_tcp libiscsi scsi_transport_iscsi ip6table_filter ip6_tables xt_comment xt_tcpudp nfnetlink_log iptable_filter nfnetlink ip_tables x_tables zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) intel_rapl iTCO_wdt iosf_mbi iTCO_
vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr ast ttm drm_kms_helper sb_edac drm joydev input_leds syscopyarea edac_core sysfillrect sysimgblt mei_me i2c_i801 mei shpchp ioatdma lpc_ich wmi ipmi_ssif 8250_fintek ipmi_msghandler acpi_power_meter acpi_pad mac_hid vhost_net vhost macvtap macvlan autofs4 raid1 hid_generic usbkbd usbmouse igb i2c_algo_bit dca ahci ptp usbhid libahci pps_core hid
Oct 30 03:00:11 host kernel: [925383.383494] CPU: 3 PID: 2063 Comm: z_wr_int_7 Tainted: P           O    4.2.2-1-pve #1
Oct 30 03:00:11 host kernel: [925383.384275] Hardware name: Supermicro Super Server/X10DRL-i, BIOS 1.1a 07/11/2015
Oct 30 03:00:11 host kernel: [925383.385055] task: ffff8827da236040 ti: ffff8827da258000 task.ti: ffff8827da258000
Oct 30 03:00:11 host kernel: [925383.385860] RIP: 0010:[<ffffffff811d1841>]  [<ffffffff811d1841>] __kmalloc_node+0x1f1/0x300
Oct 30 03:00:11 host kernel: [925383.386707] RSP: 0018:ffff8827da25baf8  EFLAGS: 00010246
Oct 30 03:00:11 host kernel: [925383.387564] RAX: 0000000000000000 RBX: 000000000000c210 RCX: 0000000004bfe3a2
Oct 30 03:00:11 host kernel: [925383.388449] RDX: 0000000004bfe3a1 RSI: 0000000000000000 RDI: 0000000000000017
Oct 30 03:00:11 host kernel: [925383.389340] RBP: ffff8827da25bb48 R08: 0000000000019ea0 R09: ffff88142f403540
Oct 30 03:00:11 host kernel: [925383.390252] R10: ffff88142f403540 R11: ffffffffc03a7ac3 R12: 000000000000c210
Oct 30 03:00:11 host kernel: [925383.391188] R13: 00000000000000c0 R14: 00000000ffffffff R15: 00ffff882457ba7e
Oct 30 03:00:11 host kernel: [925383.392143] FS:  0000000000000000(0000) GS:ffff88142fac0000(0000) knlGS:0000000000000000
Oct 30 03:00:11 host kernel: [925383.393123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 30 03:00:11 host kernel: [925383.394115] CR2: 0000008600344000 CR3: 0000002528153000 CR4: 00000000001426e0
Oct 30 03:00:11 host kernel: [925383.395134] Stack:
Oct 30 03:00:11 host kernel: [925383.396158]  ffff8827da236040 0000000000000000 ffffffffc03a7ac3 ffff88142f403540
Oct 30 03:00:11 host kernel: [925383.397239]  0000000000000000 000000000000c210 0000000000000000 0000000000000000
Oct 30 03:00:11 host kernel: [925383.398349]  0000000000000000 00000000000000c0 ffff8827da25bb98 ffffffffc03a7ac3
Oct 30 03:00:11 host kernel: [925383.399473] Call Trace:
Oct 30 03:00:11 host kernel: [925383.400603]  [<ffffffffc03a7ac3>] ? spl_kmem_zalloc+0xa3/0x180 [spl]
Oct 30 03:00:11 host kernel: [925383.401766]  [<ffffffffc03a7ac3>] spl_kmem_zalloc+0xa3/0x180 [spl]
Oct 30 03:00:11 host kernel: [925383.402992]  [<ffffffffc0595b60>] __vdev_disk_physio+0x60/0x450 [zfs]
Oct 30 03:00:11 host kernel: [925383.404184]  [<ffffffff8101cc99>] ? read_tsc+0x9/0x10
Oct 30 03:00:11 host kernel: [925383.405386]  [<ffffffff810e6aad>] ? getrawmonotonic64+0x2d/0xc0
Oct 30 03:00:11 host kernel: [925383.406648]  [<ffffffffc0596416>] vdev_disk_io_start+0x96/0x200 [zfs]
Oct 30 03:00:11 host kernel: [925383.407932]  [<ffffffffc05d29a3>] zio_vdev_io_start+0xa3/0x2d0 [zfs]
Oct 30 03:00:11 host kernel: [925383.409221]  [<ffffffffc05d3aee>] zio_execute+0xde/0x190 [zfs]
Oct 30 03:00:11 host kernel: [925383.410504]  [<ffffffffc059a67b>] vdev_queue_io_done+0x17b/0x250 [zfs]
Oct 30 03:00:11 host kernel: [925383.411805]  [<ffffffffc05d2808>] zio_vdev_io_done+0x88/0x180 [zfs]
Oct 30 03:00:11 host kernel: [925383.413116]  [<ffffffffc05d3aee>] zio_execute+0xde/0x190 [zfs]
Oct 30 03:00:11 host kernel: [925383.414422]  [<ffffffffc03ab070>] taskq_thread+0x230/0x420 [spl]
Oct 30 03:00:11 host kernel: [925383.415724]  [<ffffffff810a0570>] ? wake_up_q+0x70/0x70
Oct 30 03:00:11 host kernel: [925383.417041]  [<ffffffffc03aae40>] ? taskq_cancel_id+0x110/0x110 [spl]
Oct 30 03:00:11 host kernel: [925383.418373]  [<ffffffff810957db>] kthread+0xdb/0x100
Oct 30 03:00:11 host kernel: [925383.419675]  [<ffffffff81095700>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 30 03:00:11 host kernel: [925383.420995]  [<ffffffff817d019f>] ret_from_fork+0x3f/0x70
Oct 30 03:00:11 host kernel: [925383.422321]  [<ffffffff81095700>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 30 03:00:11 host kernel: [925383.423668] Code: 89 e1 4c 8b 45 c0 4c 89 e9 48 89 da 48 8b 75 c8 41 ff d2 4d 8b 17 4d 85 d2 75 d8 e9 23 ff ff ff 49 63 42 20 48 8d 4a 01 4d 8b 02 <49> 8b 1c 07 4c 89 f8 65 49 0f c7 08 0f 94 c0 84 c0 0f 84 5d fe 
Oct 30 03:00:11 host kernel: [925383.427972]  RSP <ffff8827da25baf8>
Oct 30 03:00:11 host kernel: [925383.639546] ---[ end trace 205ba4c785a3ab25 ]---

The error looks very similar to this ZFS bug report: github.com/zfsonlinux/zfs/issues/3933

The ZFS packages in the test repository still appear to be at version 0.6.5.2. Are there plans to implement 0.6.5.3 any time soon? The source packages have been avaible for three weeks now.

Thanks.
 
we will upload a new ZFS version and Kernel to our pvetest repo later today, please test and let us know if the new ZFS version fixes your issue.
 
Thank you. I will set up a test to run over the weekend and will report back on Monday.
 
The results look promising. The test machine survived a combination of stressapptest, fio benchmark and a ZFS scrub for the last three days. I would consider this issue solved. Thank you for your good work! :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!