This is a followup to my post at https://forum.proxmox.com/threads/p...3-latest-zfs-lxc-2-1.36943/page-4#post-184486, which I have copied below:
Per the request at https://forum.proxmox.com/threads/p...3-latest-zfs-lxc-2-1.36943/page-4#post-184610 I went ahead and tried booting a mainline kernel. Problem is I am running ZFS, so using a mainline kernel causes ZFS modules to fail to load making the system unbootable.
Just updated one of my boxes from Linux 4.10.17-3-pve #1 SMP PVE 4.10.17-23 to Linux 4.13.4-1-pve #1 SMP PVE 4.13.4-25 and sadly Infiniband is no longer working on this box. Below is the kernel panic reported in syslog:
Code:Oct 16 13:17:50 C6100-1-N4 OpenSM[3770]: SM port is down Oct 16 13:17:50 C6100-1-N4 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 Oct 16 13:17:50 C6100-1-N4 kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core] Oct 16 13:17:50 C6100-1-N4 kernel: PGD 0 Oct 16 13:17:50 C6100-1-N4 kernel: P4D 0 Oct 16 13:17:50 C6100-1-N4 kernel: Oct 16 13:17:50 C6100-1-N4 kernel: Oops: 0002 [#1] SMP Oct 16 13:17:50 C6100-1-N4 kernel: Modules linked in: iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 softdog nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c nfnetlink_log nfnetlink ib_ipoib rdma_ucm ib_umad ib_uverbs bonding 8021q garp ipmi_ssif mrp intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_pcm aesni_intel ast aes_x86_64 snd_timer crypto_simd ttm glue_helper snd cryptd dcdbas drm_kms_helper soundcore intel_cstate pcspkr drm joydev input_leds i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ib_mthca lpc_ich ioatdma i5500_temp i7core_edac shpchp mac_hid ipmi_si ipmi_devintf ipmi_msghandler vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi Oct 16 13:17:50 C6100-1-N4 kernel: ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor raid6_pq hid_generic usbmouse usbkbd usbhid hid igb(O) ahci dca mpt3sas raid_class ptp i2c_i801 libahci scsi_transport_sas pps_core Oct 16 13:17:50 C6100-1-N4 kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P IO 4.13.4-1-pve #1 Oct 16 13:17:50 C6100-1-N4 kernel: Hardware name: Dell XS23-TY3 /9CMP63, BIOS 1.71 09/17/2013 Oct 16 13:17:50 C6100-1-N4 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] Oct 16 13:17:50 C6100-1-N4 kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000 Oct 16 13:17:50 C6100-1-N4 kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core] Oct 16 13:17:50 C6100-1-N4 kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286 Oct 16 13:17:50 C6100-1-N4 kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000 Oct 16 13:17:50 C6100-1-N4 kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20 Oct 16 13:17:50 C6100-1-N4 kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0 Oct 16 13:17:50 C6100-1-N4 kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38 Oct 16 13:17:50 C6100-1-N4 kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880 Oct 16 13:17:50 C6100-1-N4 kernel: FS: 0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000 Oct 16 13:17:50 C6100-1-N4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 16 13:17:50 C6100-1-N4 kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0 Oct 16 13:17:50 C6100-1-N4 kernel: Call Trace: Oct 16 13:17:50 C6100-1-N4 kernel: ib_mad_recv_done+0x5cc/0xb50 [ib_core] Oct 16 13:17:50 C6100-1-N4 kernel: __ib_process_cq+0x5c/0xb0 [ib_core] Oct 16 13:17:50 C6100-1-N4 kernel: ib_cq_poll_work+0x20/0x60 [ib_core] Oct 16 13:17:50 C6100-1-N4 kernel: process_one_work+0x1e9/0x410 Oct 16 13:17:50 C6100-1-N4 kernel: worker_thread+0x4b/0x410 Oct 16 13:17:50 C6100-1-N4 kernel: kthread+0x109/0x140 Oct 16 13:17:50 C6100-1-N4 kernel: ? process_one_work+0x410/0x410 Oct 16 13:17:50 C6100-1-N4 kernel: ? kthread_create_on_node+0x70/0x70 Oct 16 13:17:50 C6100-1-N4 kernel: ? SyS_exit_group+0x14/0x20 Oct 16 13:17:50 C6100-1-N4 kernel: ret_from_fork+0x25/0x30 Oct 16 13:17:50 C6100-1-N4 kernel: Code: 28 00 00 00 48 89 45 e8 31 c0 4c 89 65 d8 48 8b 57 28 48 8d 47 28 4c 89 65 e0 48 39 d0 74 23 48 8b 77 28 48 8b 4f 30 48 8b 55 d8 <4c> 89 66 08 48 89 75 d8 48 89 11 48 89 4a 08 48 89 47 28 48 89 Oct 16 13:17:50 C6100-1-N4 kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38 Oct 16 13:17:50 C6100-1-N4 kernel: CR2: 0000000000000008 Oct 16 13:17:50 C6100-1-N4 kernel: ---[ end trace 937ca6a9fe8de56f ]---
My hardware is as follows:
Dell C6100
2x Intel(R) Xeon(R) CPU X5650
Mellanox Technologies MT25208 [InfiniHost III Ex] (rev 20)
Per the request at https://forum.proxmox.com/threads/p...3-latest-zfs-lxc-2-1.36943/page-4#post-184610 I went ahead and tried booting a mainline kernel. Problem is I am running ZFS, so using a mainline kernel causes ZFS modules to fail to load making the system unbootable.