PVE 5.1 and Infiniband Issues

riptide_wave

Member
Mar 21, 2013
73
2
8
Minnesota, USA
This is a followup to my post at https://forum.proxmox.com/threads/p...3-latest-zfs-lxc-2-1.36943/page-4#post-184486, which I have copied below:
Just updated one of my boxes from Linux 4.10.17-3-pve #1 SMP PVE 4.10.17-23 to Linux 4.13.4-1-pve #1 SMP PVE 4.13.4-25 and sadly Infiniband is no longer working on this box. Below is the kernel panic reported in syslog:

Code:
Oct 16 13:17:50 C6100-1-N4 OpenSM[3770]: SM port is down
Oct 16 13:17:50 C6100-1-N4 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Oct 16 13:17:50 C6100-1-N4 kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core]
Oct 16 13:17:50 C6100-1-N4 kernel: PGD 0
Oct 16 13:17:50 C6100-1-N4 kernel: P4D 0
Oct 16 13:17:50 C6100-1-N4 kernel:
Oct 16 13:17:50 C6100-1-N4 kernel: Oops: 0002 [#1] SMP
Oct 16 13:17:50 C6100-1-N4 kernel: Modules linked in: iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 softdog nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c nfnetlink_log nfnetlink ib_ipoib rdma_ucm ib_umad ib_uverbs bonding 8021q garp ipmi_ssif mrp intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_pcm aesni_intel ast aes_x86_64 snd_timer crypto_simd ttm glue_helper snd cryptd dcdbas drm_kms_helper soundcore intel_cstate pcspkr drm joydev input_leds i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ib_mthca lpc_ich ioatdma i5500_temp i7core_edac shpchp mac_hid ipmi_si ipmi_devintf ipmi_msghandler vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
Oct 16 13:17:50 C6100-1-N4 kernel:  ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor raid6_pq hid_generic usbmouse usbkbd usbhid hid igb(O) ahci dca mpt3sas raid_class ptp i2c_i801 libahci scsi_transport_sas pps_core
Oct 16 13:17:50 C6100-1-N4 kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P          IO    4.13.4-1-pve #1
Oct 16 13:17:50 C6100-1-N4 kernel: Hardware name: Dell       XS23-TY3        /9CMP63, BIOS 1.71 09/17/2013
Oct 16 13:17:50 C6100-1-N4 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Oct 16 13:17:50 C6100-1-N4 kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000
Oct 16 13:17:50 C6100-1-N4 kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core]
Oct 16 13:17:50 C6100-1-N4 kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286
Oct 16 13:17:50 C6100-1-N4 kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000
Oct 16 13:17:50 C6100-1-N4 kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20
Oct 16 13:17:50 C6100-1-N4 kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0
Oct 16 13:17:50 C6100-1-N4 kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38
Oct 16 13:17:50 C6100-1-N4 kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880
Oct 16 13:17:50 C6100-1-N4 kernel: FS:  0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000
Oct 16 13:17:50 C6100-1-N4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 16 13:17:50 C6100-1-N4 kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0
Oct 16 13:17:50 C6100-1-N4 kernel: Call Trace:
Oct 16 13:17:50 C6100-1-N4 kernel:  ib_mad_recv_done+0x5cc/0xb50 [ib_core]
Oct 16 13:17:50 C6100-1-N4 kernel:  __ib_process_cq+0x5c/0xb0 [ib_core]
Oct 16 13:17:50 C6100-1-N4 kernel:  ib_cq_poll_work+0x20/0x60 [ib_core]
Oct 16 13:17:50 C6100-1-N4 kernel:  process_one_work+0x1e9/0x410
Oct 16 13:17:50 C6100-1-N4 kernel:  worker_thread+0x4b/0x410
Oct 16 13:17:50 C6100-1-N4 kernel:  kthread+0x109/0x140
Oct 16 13:17:50 C6100-1-N4 kernel:  ? process_one_work+0x410/0x410
Oct 16 13:17:50 C6100-1-N4 kernel:  ? kthread_create_on_node+0x70/0x70
Oct 16 13:17:50 C6100-1-N4 kernel:  ? SyS_exit_group+0x14/0x20
Oct 16 13:17:50 C6100-1-N4 kernel:  ret_from_fork+0x25/0x30
Oct 16 13:17:50 C6100-1-N4 kernel: Code: 28 00 00 00 48 89 45 e8 31 c0 4c 89 65 d8 48 8b 57 28 48 8d 47 28 4c 89 65 e0 48 39 d0 74 23 48 8b 77 28 48 8b 4f 30 48 8b 55 d8 <4c> 89 66 08 48 89 75 d8 48 89 11 48 89 4a 08 48 89 47 28 48 89
Oct 16 13:17:50 C6100-1-N4 kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38
Oct 16 13:17:50 C6100-1-N4 kernel: CR2: 0000000000000008
Oct 16 13:17:50 C6100-1-N4 kernel: ---[ end trace 937ca6a9fe8de56f ]---

My hardware is as follows:
Dell C6100
2x Intel(R) Xeon(R) CPU X5650
Mellanox Technologies MT25208 [InfiniHost III Ex] (rev 20)

Per the request at https://forum.proxmox.com/threads/p...3-latest-zfs-lxc-2-1.36943/page-4#post-184610 I went ahead and tried booting a mainline kernel. Problem is I am running ZFS, so using a mainline kernel causes ZFS modules to fail to load making the system unbootable.
 
This is a followup to my post at https://forum.proxmox.com/threads/p...3-latest-zfs-lxc-2-1.36943/page-4#post-184486, which I have copied below:


Per the request at https://forum.proxmox.com/threads/p...3-latest-zfs-lxc-2-1.36943/page-4#post-184610 I went ahead and tried booting a mainline kernel. Problem is I am running ZFS, so using a mainline kernel causes ZFS modules to fail to load making the system unbootable.

that is unfortunate - we don't have IB hardware to test, so without a way to narrow this down further it might be hard to fix (maybe you are able to put together a test environment on a spare disk? network boot?). that being said, there have been a few Mellanox/IB related commits in the 4.13 stable tree..
 
that is unfortunate - we don't have IB hardware to test, so without a way to narrow this down further it might be hard to fix (maybe you are able to put together a test environment on a spare disk? network boot?). that being said, there have been a few Mellanox/IB related commits in the 4.13 stable tree..

I will see if I can get some time within the next week to test on a clean ext4 install.

The kernel panic appears to be subnet manager related; if you disable it does the kernel panic go away?

When opensm is disabled, there is no more kernel panic, but infiniband obviously fails to connect to my storage (NFS via RDMA) and dmesg is still getting errors from the infiniband driver:

[ 21.083218] infiniband mthca0: ib_post_send_mad error
[ 21.301744] infiniband mthca0: ib_post_send_mad error
[ 21.505768] infiniband mthca0: ib_post_send_mad error
[ 21.709765] infiniband mthca0: ib_post_send_mad error
[ 31.082987] infiniband mthca0: ib_post_send_mad error

As soon as OpenSM is started, the kernel panic occurs.
 
thats good news, sort of. the SM doesnt need to run on this node- it can run on any IB attached machine (I normally run it on the switches.)

If you're dead set on running it on a proxmox 5 node you probably need to wait for ofed 4.2
 
that is unfortunate - we don't have IB hardware to test, so without a way to narrow this down further it might be hard to fix (maybe you are able to put together a test environment on a spare disk? network boot?). that being said, there have been a few Mellanox/IB related commits in the 4.13 stable tree..

I installed Proxmox on a separate SSD, and tested my setup on a few different kernels. Below is the results:
4.10.17-2-pve = Works as expected
4.13.4-1-pve = Kernel Panics
4.13.8 Mainline = Kernel Panics
4.12.14 Mainline = Works as expected
4.14.0-rc5 Mainline = Kernel Panics

thats good news, sort of. the SM doesnt need to run on this node- it can run on any IB attached machine (I normally run it on the switches.)

If you're dead set on running it on a proxmox 5 node you probably need to wait for ofed 4.2

Sadly I run infiniband direct between my NAS & Compute nodes (I run a very small environment) so I kind of require OpenSM on each node for IPoIB to work correctly (from my understanding).
 
Seems I was able to resolve the issue. Specifically, I followed @alexskysilk 's advice on opensm, as well as upgraded my NAS to a 4.13 mainline kernel so everything matched. Once done, NFS over RDMA started working again as expected. :)
 
if you could also test 4.12.0 and 4.13.0, we might be able to narrow down the range of possible responsible commits..
 
if you could also test 4.12.0 and 4.13.0, we might be able to narrow down the range of possible responsible commits..

@fabian For some reason I am not able to reproduce this issue like I was in the past, or at least, not at the same frequency. I will test these kernels once I figure out how to re-produce this problem. I was also asked in the Mailing List to try and disable
CONFIG_SECURITY_INFINIBAND [0], which I will try doing to the latest PVE kernel to see if it has any affect on the issue.

[0]: https://marc.info/?l=linux-rdma&m=150934800419293&w=2

EDIT: Got it to occur again.
4.12.0 Mainline = Works as expected
4.13.0 Mainline = Kernel Panics

EDIT2: I went ahead and compiled a PVE kernel with CONFIG_SECURITY_INFINIBAND disabled (using PVE_CONFIG_OPTS) and so far this seems to fix the issue. I will run this for a week or so to verify all is stable.
 
Last edited:
@fabian So far I can confirm disabling CONFIG_SECURITY_INFINIBAND resolves the issue on 4.13.* kernels. Can you please consider making this change to pve-kernel?

As for this fixing the issue, per the mailing list thread it sounds like apparmor may be the actual root cause of the issue. The above should be considered a workaround until the actual root cause is found/resolved.
 
I will wait a litte bit longer to see if a more complete fix is developed (thanks for working directly with the upstream devs btw!), and if not, apply the workaround on the next kernel update.
 
@fabian via off-channels I was able to work with the Mellanox Developer to get this fixed. It seems there were two different bugs going on. Patches for each are below, and were tested on the pve-kernel by placing them in the kernel patches folder.
 

Attachments

  • proxmox-ib-patches.zip
    1.5 KB · Views: 6
scope looks limited enough, we'll see about integrating them with the next kernel update.
 
just as a heads up - will probably skip this for this update cycle as not all of those patches are publicly available yet:

your first patch (0007) is actually https://patchwork.kernel.org/patch/10034779/ which has been reviewed and queued upstream
your second patch (0008) is not available publicly (yet, maybe ping mellanox about that ;))
there is a related patch available at https://patchwork.kernel.org/patch/10067903/ which is currently being reviewed/discussed
 
latest pve-kernel-4.13.8-3-pve on pvetest contains both patches!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!