OFED for PVE

They most definitely were not removed when I removed mstflint. Autoremove removed only three libraries, I can't recall their names but they were related to the unneeded mstflint. I have never grown a neckbeard so I don't have the long history with the evolution of package management from compiling your own from source to using a package manager with a nice gui in KDE but I will certainly take the dkms install --all command under advisement for the next kernel upgrade. You aren't saying that the OFED modules will recompile themselves after upgrading the kernel are you? I kind of thought that was what you and Jeff were saying early in this thread. Upgrade Kernel, install new kernel's header package, then reboot and it will just work. That was my naive expectation based on the conversation. After that didn't work for me, am I now correct in thinking that I should still need to run "dkms install --all" after installing the headers but before rebooting? Do I need to reboot after installing the new headers and then run the dkms command? You probably can't dumb it down too much for me.
 
If you install a kernel package and the associated kernel headers are installed, all modules registered with DKMS should be rebuilt automatically. You can always check the status of DKMS modules with "dkms status", which might be a good idea before you try to reboot in a new kernel.

I just tried the "MLNX_OFED_LINUX-3.3-1.0.0.0-debian8.3-x86_64.tgz" release by mellanox, the script there is still broken and tries to remove proxmox packages, but the deb-files in there seem to work as expected (and without dependencies on newer standard libraries for the user space tools).

I just realized that our pve-headers-xx packages have a dependency on the respective pve-kernel-xx package, which is wrong (apt-get tries to install the kernel package before the headers package, so when the DKMS rebuild is triggered the headers are not yet available). This should be fixed soon, along with a new pve-headers meta package that will always depend on the latest pve-headers-xx package.

before re-installing pve-kernel and triggering a DKMS build
Code:
# dkms status
iser, 1.8.0, 4.4.8-1-pve, x86_64: installed (original_module exists)
kernel-mft-dkms, 4.4.0, 4.4.8-1-pve, x86_64: installed
knem, 1.1.2.90mlnx1, 4.4.8-1-pve, x86_64: installed
mlnx-ofed-kernel, 3.3, 4.4.8-1-pve, x86_64: installed (original_module exists)
srp, 1.6.0, 4.4.8-1-pve, x86_64: installed (original_module exists)

reinstalling the kernel (this could be replaced with "dkms autoinstall -k 4.4.10-1-pve; update-initramfs -u" to just trigger a rebuild of DKMS modules and the initramfs)
Code:
# apt-get install --reinstall pve-kernel-4.4.10-1-pve

Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following packages were automatically installed and are no longer required:
  asciidoc pve-doc-generator
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 37 not upgraded.
Need to get 0 B/44.0 MB of archives.
After this operation, 0 B of additional disk space will be used.
(Reading database ... 112558 files and directories currently installed.)
Preparing to unpack .../pve-kernel-4.4.10-1-pve_4.4.10-54_amd64.deb ...
Unpacking pve-kernel-4.4.10-1-pve (4.4.10-54) over (4.4.10-54) ...
Setting up pve-kernel-4.4.10-1-pve (4.4.10-54) ...
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.10-1-pve /boot/vmlinuz-4.4.10-1-pve
run-parts: executing /etc/kernel/postinst.d/dkms 4.4.10-1-pve /boot/vmlinuz-4.4.10-1-pve
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.10-1-pve /boot/vmlinuz-4.4.10-1-pve
update-initramfs: Generating /boot/initrd.img-4.4.10-1-pve
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.4.10-1-pve /boot/vmlinuz-4.4.10-1-pve
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.10-1-pve
Found initrd image: /boot/initrd.img-4.4.10-1-pve
Found linux image: /boot/vmlinuz-4.4.8-1-pve
Found initrd image: /boot/initrd.img-4.4.8-1-pve
Found linux image: /boot/vmlinuz-4.4.6-1-pve
Found initrd image: /boot/initrd.img-4.4.6-1-pve
Found linux image: /boot/vmlinuz-4.2.6-1-pve
Found initrd image: /boot/initrd.img-4.2.6-1-pve
Found memtest86+ image: /ROOT/pve-1@/boot/memtest86+.bin
Found memtest86+ multiboot image: /ROOT/pve-1@/boot/memtest86+_multiboot.bin
done
Processing triggers for initramfs-tools (0.120+deb8u1) ...
update-initramfs: Generating /boot/initrd.img-4.4.10-1-pve

dkms status after rebuild shows that the modules for 4.4.10 are installed:
Code:
# dkms status
iser, 1.8.0, 4.4.10-1-pve, x86_64: installed (original_module exists)
iser, 1.8.0, 4.4.8-1-pve, x86_64: installed (original_module exists)
kernel-mft-dkms, 4.4.0, 4.4.10-1-pve, x86_64: installed
kernel-mft-dkms, 4.4.0, 4.4.8-1-pve, x86_64: installed
knem, 1.1.2.90mlnx1, 4.4.10-1-pve, x86_64: installed
knem, 1.1.2.90mlnx1, 4.4.8-1-pve, x86_64: installed
mlnx-ofed-kernel, 3.3, 4.4.10-1-pve, x86_64: installed (original_module exists)
mlnx-ofed-kernel, 3.3, 4.4.8-1-pve, x86_64: installed (original_module exists)
srp, 1.6.0, 4.4.10-1-pve, x86_64: installed (original_module exists)
srp, 1.6.0, 4.4.8-1-pve, x86_64: installed (original_module exists)
 
  • Like
Reactions: CBdVSdFSMB
Thank you so much. That probably explains the error messages I got during last kernel upgrade. I first was very happy that the headers-package now is kept up to date automatically when the system gets upgraded, but then was a little disappointed when getting the errors (although I could have seen it too by checking the order in which they got installed). ;-)

Cheers, Johannes
 
Thank you very much Fabian for taking the time to clarify and for giving us all a little tutorial on how to maintain kernel modules! Very helpful and saved me a ton of time flailing around on Google.

Most Sincerely,

GB
 
Thank you so much. That probably explains the error messages I got during last kernel upgrade. I first was very happy that the headers-package now is kept up to date automatically when the system gets upgraded, but then was a little disappointed when getting the errors (although I could have seen it too by checking the order in which they got installed). ;-)

Just to make this clear - the pve-headers-xx packages are not automatically installed for new kernels (yet). They are automatically upgraded for ABI-compatible upgrades (because those are just new versions of the same packages, not new packages). This will probably change, I posted a patch to pve-devel yesterday that enables building a "pve-headers" meta package that always depends on the newest "pve-headers-xx" package, just like the meta package "proxmox-ve" always depends (among other things) on the newest "pve-kernel-xx" package. But this patch has not yet been applied - I will post a short reminder here if and when the pve-headers package is available in the repositories.

As a side-note, we are also working on getting more uptodate Melanox kernel modules integrated into our kernel packages, but this will take a little more time. Once that is done, you should be able to drop the packages from Melanox again unless you need the userspace/firmware/.. tools.
 
As a side-note, we are also working on getting more uptodate Melanox kernel modules integrated into our kernel packages,

Hello Fabian,
does this mean they will be built-in in the Kernel or via DKMS ?

I recently ran into a bug in ib_srp regarding mappings that reliable caused a Kernel-Panic / Hang-Up.
A simple fio or also cp of a big file was enough to trigger this, from hypervisor and also within VM`s that live on this storage.

Must be related with this:
https://www.spinics.net/lists/linux-rdma/msg35324.html

This is obviously fixed in-tree, so i tried to backport to your 4.4 Kernel - without success.
There were just too many changes to make it compile at all.
(failed to build due to changes in other parts of the ib-stack, so i digged further/replaced this with current, then it wasn't compatible with netfilter and so on)

After a while i ended with building a 4.7-rc2 + recent zfs against it for proxmox to see if this would fix these mad crashes.

SRP is stable now, also at higher rates and performance is fine. Until now i also didn't run into any troubles related to the new kernel with other parts of the Hypervisor.
But i`m really not keen about running this in production.

I`m not sure if the patches that fixed the issue will ever be backported to 4.4 (should be, as it is still supported), and when.
Also, i wonder if this will be merged into Mellanox OFED, and also, when.

So, when you work on the integration of an OFED other than in-tree, would it be possible to investigate if this issue was adressed there ?

Also, i guess the OFED from here is much more up-to-date:
http://downloads.openfabrics.org/OFED/

Most likely, 3.18-2 contains recent commits.

If you have something to test/build i would be happy to do so in my labs fabric.

Alex
 
Last edited:
Hello Fabian,
does this mean they will be built-in in the Kernel or via DKMS ?

yes, as a pre-built / compiled module, not DKMS

I recently ran into a bug in ib_srp regarding mappings that reliable caused a Kernel-Panic / Hang-Up.
A simple fio or also cp of a big file was enough to trigger this, from hypervisor and also within VM`s that live on this storage.

Must be related with this:
https://www.spinics.net/lists/linux-rdma/msg35324.html

This is obviously fixed in-tree, so i tried to backport to your 4.4 Kernel - without success.
There were just too many changes to make it compile at all.
(failed to build due to changes in other parts of the ib-stack, so i digged further/replaced this with current, then it wasn't compatible with netfilter and so on)

After a while i ended with building a 4.7-rc2 + recent zfs against it for proxmox to see if this would fix these mad crashes.

SRP is stable now, also at higher rates and performance is fine. Until now i also didn't run into any troubles related to the new kernel with other parts of the Hypervisor.
But i`m really not keen about running this in production.

I`m not sure if the patches that fixed the issue will ever be backported to 4.4 (should be, as it is still supported), and when.
Also, i wonder if this will be merged into Mellanox OFED, and also, when.

So, when you work on the integration of an OFED other than in-tree, would it be possible to investigate if this issue was adressed there ?

Also, i guess the OFED from here is much more up-to-date:
http://downloads.openfabrics.org/OFED/

Most likely, 3.18-2 contains recent commits.

If you have something to test/build i would be happy to do so in my labs fabric.

Alex

if you know about specific commits/patches that fix issues for you, feel free to give us a pointer on pve-devel . if it is feasable, we can try to backport individual patches to our kernel, but since we currently don't have mellanox hardware (might change in the near future) it is always a bit difficult to test.. the next kernel based on 4.4.13 will probably be available for testing next week by the way ;)
 
Also, i guess the OFED from here is much more up-to-date:
http://downloads.openfabrics.org/OFED/

Most likely, 3.18-2 contains recent commits.

Forget this, i had a quick look at it.

However, it looks like ib_srp from Mellanox OFED is much older than in-tree.

This is what i get when i build with the Sources from Mellanox:
modinfo ib_srp
filename: /lib/modules/4.4.10-1-pve/updates/dkms/ib_srp.ko
version: 1.6.0
license: Dual BSD/GPL
description: InfiniBand SCSI RDMA Protocol initiator
author: Roland Dreier
srcversion: 035E18BD4F05A7ECF305271
depends: ib_core,ib_sa,scsi_transport_srp,mlx_compat,ib_cm
vermagic: 4.4.10-1-pve SMP mod_unload modversions


On a fresh install it looks like:
modinfo ib_srp
filename: /lib/modules/4.4.6-1-pve/kernel/drivers/infiniband/ulp/srp/ib_srp.ko
release_date: July 26, 2015
version: 2.0
license: Dual BSD/GPL
description: InfiniBand SCSI RDMA Protocol initiator

So, either the Module-Info in the Mellanox sources was never updated or also here ib_srp is very old, even older than in-tree :(
 
yes, as a pre-built / compiled module, not DKMS

if you know about specific commits/patches that fix issues for you, feel free to give us a pointer on pve-devel .

Ok, DKMS would would be easier to update / re-compile, or am i wrong with this ?

For me it`s impossible to get the idea how these patches relate to each other - i`m really not so deep into C and the whole kernel-development.

I guess people at linux-rdma mailing list know what exactly fixed this specific issue, if you look in the 4.7 - tree there happened quite a lot around the srp-stuff the last weeks.

As you use an Ubuntu-Kernel, maybe they will merge these into the xenial-tree ?

Currently it looks like it hasn't happened yet, in the 4.7 tree there are many newer commits.

http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/log/drivers/infiniband/ulp/srp/ib_srp.c

Alex
 
Ok, DKMS would would be easier to update / re-compile, or am i wrong with this ?

you can always use DKMS or compile your own modules, no matter which version we ship with pve-kernel-xx ;) the problem with that approach is that it can easily break with updates:
you install an update, DKMS should automatically recompile the modules, but for some reason the compilation fails and you don't notice. after a reboot, your modules are either gone or reverted to the bundled ones. for modules that we ship together with the kernel, this cannot happen (because the whole kernel build fails if one individual module fails to compile, and does not reach the repositories)

I guess people at linux-rdma mailing list know what exactly fixed this specific issue, if you look in the 4.7 - tree there happened quite a lot around the srp-stuff the last weeks.

As you use an Ubuntu-Kernel, maybe they will merge these into the xenial-tree ?

Currently it looks like it hasn't happened yet, in the 4.7 tree there are many newer commits.

http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/log/drivers/infiniband/ulp/srp/ib_srp.c

unfortunately, backporting all of the changes from 4.7 (no matter in which area of the kernel) is not possible - the result would be far from stable anyhow. individual, understandable and necessary backports/cherry-picks and security updates are of course possible. to give you an idea: the last few stable 4.4 kernel releases (e.g., from 4.4.12 to 4.4.13) where each around a hundred patches alltogether in linux-stable, Ubuntu then adds a few of their own (this varies, but usually less than upstream) and Proxmox then adds a few more. the more patches we add, the more difficult it becomes to keep the kernel uptodate - the same is true for Ubuntu. so if possible, backported or cherry-picked patches should always be integrated as far up the chain as possible - ideally in linux-stable itself.
 
Yes, i absolutely understand that it`s not possible to backport everything. Result can only be a big mess.

You are absolutely right, that indeed it should come from upstream / linux-stable.
Hopefully these fixes will make it in the 4.4 - tree, as it`s quite odd to crash the whole node with something simple like cp on a single file.

When you decide to ship Mellanox OFED with your kernel ...
Would it not be better to stay with the in-tree OFED as it`s for sure newer and fixed way faster ?
I wonder if this is not a step in the wrong direction regarding an up-to-date and stable OFED stack.
 
The Mellanox drivers that ship with the 4.4.21-1-pve kernel work great with my Mellanox ConnectX-2 cards.

However the eIPoIB driver is missing.
The eIPoIB driver enables bridging on the Infiniband IPoIB device, which allows us to use Infiniband for the guest machines.

@fabian do you think there is any chance of getting the eth_ipoib module built in to the proxmox kernel?

EDIT: More info on eIPoIB can be found here: https://lwn.net/Articles/507258/
 
Last edited:
The Mellanox drivers that ship with the 4.4.21-1-pve kernel work great with my Mellanox ConnectX-2 cards.

However the eIPoIB driver is missing.
The eIPoIB driver enables bridging on the Infiniband IPoIB device, which allows us to use Infiniband for the guest machines.

@fabian do you think there is any chance of getting the eth_ipoib module built in to the proxmox kernel?

EDIT: More info on eIPoIB can be found here: https://lwn.net/Articles/507258/

is there anything more recent than that (unmerged) patch set from 2012? seems like the reactions where not very positive back then (taken from the v2 from August 2012: https://lwn.net/Articles/509448/)

eIPoIB does not work.

I can't get an IP address with out a specially configured dhcp server, and special dhcp clients.

eIPoIB does not work with IPv6.

As David Miller already said this code has no chance of being merged.
 
I have not seen any newer merge requests than that.

The eipoib module is shipped with the most recent versions of the OFED package from both Mellanox (OFED 3.4) and from the OpenFabrics Alliance (OFED 4.8).

I am however unsure if there have been any updates to the eIPoIB driver or if they are simply just shipping the old version.
 
After taking a quick look at the sources from the Mellanox OFED 3.4 I can see that there are some differences from the V2 patch.

I have not been able to find any newer merge requests on the kernel mailing list. But I did see that Oracle Linux accepted the eipoib driver back in 2013.

The patches are here: https://patchwork.ozlabs.org/project/netdev/list/?state=*&q=net/eipoib&archive=both
And further discussions here: https://marc.info/?l=linux-netdev&w=2&r=1&s=eipoib&q=b

I have attached the sources from OFED 3.4 to this post if anyone wants to take a look.
 

Attachments

  • eth_ipoib_sources.zip
    29.4 KB · Views: 3
After taking a quick look at the sources from the Mellanox OFED 3.4 I can see that there are some differences from the V2 patch.

I have not been able to find any newer merge requests on the kernel mailing list. But I did see that Oracle Linux accepted the eipoib driver back in 2013.

The patches are here: https://patchwork.ozlabs.org/project/netdev/list/?state=*&q=net/eipoib&archive=both
And further discussions here: https://marc.info/?l=linux-netdev&w=2&r=1&s=eipoib&q=b

I have attached the sources from OFED 3.4 to this post if anyone wants to take a look.

the reaction from the upstream maintainers was pretty much "this is very wrong, this won't get merged" so I don't think this is something that we want to carry on our own.. I am not an IB expert though, did any of the recommendations by the linux-net people get implemented? EoIB (i.e., "Ethernet over IB") or the proposed virtio-inifiniband drivers?
 
I understand the reaction from the upstream maintainers and agree with them, this is not the "proper way" of doing things. But it's a good workaround while no other solutions exist.

I haven't found any information on EoIB or IPoIB moving in this direction yet. IPoIB is still layer 3 and that will probably not change, there was some talk about EoIB but nothing recent.
The virtio-infiniband drivers seem to be on the roadmap, but it looks like there hasn't been any progress on that yet. Or at least nothing publically available.

I did take a look at the openvswitch mailing list as well, unfortunately Infiniband support doesn't seem to be getting any attention there.

It looks like the eIPoIB driver is the only thing that's usable right now.

Going through the changelogs for OFED I can see that eIPoIB is still being developed, there there was even a bugfix on eIPoIB back in february 2016.
 
The Mellanox drivers that ship with the 4.4.21-1-pve kernel work great with my Mellanox ConnectX-2 cards.

However the eIPoIB driver is missing.
I dont know why its missing, but this tidbit from the OFED 3.4 manual may be of interest to you: Ethernet over IB (EoIB) is currently supported in ConnectX®-3/ConnectX®-3 Pro adapter cards only.

Whats most interesting (to me) about that statement is the fact thats its NOT supported for connectx 4 hardware.... what I read from it is that EOIB is deprecated for all intents and purposes. since most current shipping hardware are CNAs anyway, I can see why.
 
I dont know why its missing, but this tidbit from the OFED 3.4 manual may be of interest to you: Ethernet over IB (EoIB) is currently supported in ConnectX®-3/ConnectX®-3 Pro adapter cards only.

You are talking about EoIB while I was talking about eIPoIB.
But yeah, seems like eoib and eipoib aren't getting much attention lately unfortunately.


I've been trying various setups on my Infiniband cards with the intention to get vm's running on ipoib.

Installed OFED 3.4
Got ipoib and eipoib working, ipoib and eipoib performance is the same but the performance is worse than with the ipoib kernel module that ships with proxmox. I tried various tweaks but was unable to achive the same performance.
Any idea why that might be @fabian ?

Hacked together the 3.4 eipoib kernel module with the 2.2-1 mlx kernel modules that ship with the proxmox kernel, performance was better but kernel panics did follow.

Tried to get eoib running, but without luck. I got the eoib kernel module compiled and loaded but was unable to get it working.

Finally I found out that I could use gre tunnels to get get layer 2 over ipoib.

The "native" linux gre tunnel performs great, and is on par with the proxmox ipoib kernel module.
Using gre with Open vSwitch does not give the same performance and is approx 3gbit slower than the "native" gre tunnel, tried a few tweaks but I wasn't able to get any significant performance increase.

Sadly it seems that native linux gre tunnels can't be used in a ovs bridge.

Seems like there is no winning in this situation!
 
Installed OFED 3.4
Got ipoib and eipoib working, ipoib and eipoib performance is the same but the performance is worse than with the ipoib kernel module that ships with proxmox. I tried various tweaks but was unable to achive the same performance.
Any idea why that might be @fabian ?

unfortunately no - our IB experience is rather limited, the few users here in the forum that are actually using it in production can probably give more input..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!