Thanks for the update @fabian, I reached out to the mellanox engineers to see if I can get any updates on the other patch in regards to getting merged into mainline.
EDIT: The 2nd patch is currently pending at...
@fabian via off-channels I was able to work with the Mellanox Developer to get this fixed. It seems there were two different bugs going on. Patches for each are below, and were tested on the pve-kernel by placing them in the kernel patches folder.
@fabian So far I can confirm disabling CONFIG_SECURITY_INFINIBAND resolves the issue on 4.13.* kernels. Can you please consider making this change to pve-kernel?
As for this fixing the issue, per the mailing list thread it sounds like apparmor may be the actual root cause of the issue. The...
@fabian For some reason I am not able to reproduce this issue like I was in the past, or at least, not at the same frequency. I will test these kernels once I figure out how to re-produce this problem. I was also asked in the Mailing List to try and disable
CONFIG_SECURITY_INFINIBAND [0], which...
Just as an update, sadly this isn't really fixed. There still seems to be an issue, but I can't track it down. As the issue is upstream in Linux and is not directly an issue with Proxmox's kernel, I have moved the issue to the linux-rdma mailing list at...
Seems I was able to resolve the issue. Specifically, I followed @alexskysilk 's advice on opensm, as well as upgraded my NAS to a 4.13 mainline kernel so everything matched. Once done, NFS over RDMA started working again as expected. :)
I installed Proxmox on a separate SSD, and tested my setup on a few different kernels. Below is the results:
4.10.17-2-pve = Works as expected
4.13.4-1-pve = Kernel Panics
4.13.8 Mainline = Kernel Panics
4.12.14 Mainline = Works as expected
4.14.0-rc5 Mainline = Kernel Panics
Sadly I run...
I will see if I can get some time within the next week to test on a clean ext4 install.
When opensm is disabled, there is no more kernel panic, but infiniband obviously fails to connect to my storage (NFS via RDMA) and dmesg is still getting errors from the infiniband driver:
As soon as...
This is a followup to my post at https://forum.proxmox.com/threads/planning-proxmox-ve-5-1-ceph-luminous-kernel-4-13-latest-zfs-lxc-2-1.36943/page-4#post-184486, which I have copied below:
Per the request at...
Just updated one of my boxes from Linux 4.10.17-3-pve #1 SMP PVE 4.10.17-23 to Linux 4.13.4-1-pve #1 SMP PVE 4.13.4-25 and sadly Infiniband is no longer working on this box. Below is the kernel panic reported in syslog:
Oct 16 13:17:50 C6100-1-N4 OpenSM[3770]: SM port is down
Oct 16 13:17:50...
Can you please share the following information:
How many nodes do you have?
Are they all configured in the same cluster? (I assume so?)
Are your droplets configured in HA?
Are all nodes in the same HA group?
It sounds like you may have lost quorum which can cause HA droplets to power off until...
Hello, just have a quick general question. I recently created a basic python startup service that uses the Proxmox API to provision Debian VMs on boot, which relies on using the VM note section for any boot script that needs to be ran. Because of this, I was curious if there is a character limit...
Hello,
Recently I have been having issues with one of my VM instances locking up, and am having a hard time diagnosing the cause. Currently the plan is to just run "qm terminal 1**" in a screen session, but this is not the best way as it does not persist across reboots of the hypervisor.
With...
Looking good so far!
My only request is that the information stored on this page can be queried via API. This would make it easy to automate auto-balancing and scaling across an environment, as well as monitoring your cluster at a high level.
Yeah, reason I only use console=ttyS0 is because my proxmox host has no GPU (embedded or external) so the only way in is via console, thus there are no ttyS "interfaces" available to me.
Have you confirmed that your grub config in /boot was updated? You can force this using the "update-grub2" command as root. I also personally remove "quiet" as this suppresses most of the kernel's boot information. If it helps, here is the config I use on my system:
# If you change this file...
Awesome! :)
As for the ROM error, this can be ignored as it won't affect the functionality. As for the patch, I went ahead and submitted them upstream at http://comments.gmane.org/gmane.linux.kernel.pci/52039
EDIT: Patch was merged, targeted for Linux 4.8 release.
EDIT2: Patch was accepted...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.