Thanks for the update @fabian, I reached out to the mellanox engineers to see if I can get any updates on the other patch in regards to getting merged into mainline.
EDIT: The 2nd patch is currently pending at...
@fabian via off-channels I was able to work with the Mellanox Developer to get this fixed. It seems there were two different bugs going on. Patches for each are below, and were tested on the pve-kernel by placing them in the kernel patches folder.
@fabian So far I can confirm disabling CONFIG_SECURITY_INFINIBAND resolves the issue on 4.13.* kernels. Can you please consider making this change to pve-kernel?
As for this fixing the issue, per the mailing list thread it sounds like apparmor may be the actual root cause of the issue. The...
@fabian For some reason I am not able to reproduce this issue like I was in the past, or at least, not at the same frequency. I will test these kernels once I figure out how to re-produce this problem. I was also asked in the Mailing List to try and disable
CONFIG_SECURITY_INFINIBAND [0], which...
Just as an update, sadly this isn't really fixed. There still seems to be an issue, but I can't track it down. As the issue is upstream in Linux and is not directly an issue with Proxmox's kernel, I have moved the issue to the linux-rdma mailing list at...
Seems I was able to resolve the issue. Specifically, I followed @alexskysilk 's advice on opensm, as well as upgraded my NAS to a 4.13 mainline kernel so everything matched. Once done, NFS over RDMA started working again as expected. :)
I installed Proxmox on a separate SSD, and tested my setup on a few different kernels. Below is the results:
4.10.17-2-pve = Works as expected
4.13.4-1-pve = Kernel Panics
4.13.8 Mainline = Kernel Panics
4.12.14 Mainline = Works as expected
4.14.0-rc5 Mainline = Kernel Panics
Sadly I run...
I will see if I can get some time within the next week to test on a clean ext4 install.
When opensm is disabled, there is no more kernel panic, but infiniband obviously fails to connect to my storage (NFS via RDMA) and dmesg is still getting errors from the infiniband driver:
As soon as...
This is a followup to my post at https://forum.proxmox.com/threads/planning-proxmox-ve-5-1-ceph-luminous-kernel-4-13-latest-zfs-lxc-2-1.36943/page-4#post-184486, which I have copied below:
Per the request at...
Just updated one of my boxes from Linux 4.10.17-3-pve #1 SMP PVE 4.10.17-23 to Linux 4.13.4-1-pve #1 SMP PVE 4.13.4-25 and sadly Infiniband is no longer working on this box. Below is the kernel panic reported in syslog:
Oct 16 13:17:50 C6100-1-N4 OpenSM[3770]: SM port is down
Oct 16 13:17:50...
Can you please share the following information:
How many nodes do you have?
Are they all configured in the same cluster? (I assume so?)
Are your droplets configured in HA?
Are all nodes in the same HA group?
It sounds like you may have lost quorum which can cause HA droplets to power off until...
Hello, just have a quick general question. I recently created a basic python startup service that uses the Proxmox API to provision Debian VMs on boot, which relies on using the VM note section for any boot script that needs to be ran. Because of this, I was curious if there is a character limit...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.