Proxmox VE 4.1 Infiniband problem

ikn · Mar 13, 2016

Hi.
I am sorry for my bad english

I have 2 cluster on the Proxmox VE ver. 3.4 and 4.1
All equipment is absolutely identical in both clusters
All nodes has 2*X5650 Xeon, 96Gb RAM and InfiniBand Card: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT / s - IB QDR / 10GigE] (rev b0)
Both clusters uses of the same Mellanox infiniband unmanaged switches
In version 3.4 infiniband works well, without any problems.
In version 4.1 with high network activity network (aka vzdump) becomes unavailable.
After that, in the event log there is a message about the loss of a quorum and then reboots the node.

e100 · Apr 7, 2016

I can confirm that this is a real problem.

Here is what I know:
1. There are no kernel messages when the IPoIB network stops working
2. If I ifdown then ifup the IPoIB interface the IB network starts working again
3. Only seems to happen if server is under heavy load from ( lots of IO and/or lots of CPU usage)

Some additional information.

I had this problem when doing an off-line disk move in the Proxmox interface from CEPH to DRBD9
My disks and network are capable of syncing at 100MB/sec, I had set c-max-rate to 102400 (100MB/s) and experienced this network issue multiple times.
I changed c-max-rate to 10240 (10MB/sec ) and do not have this network issue.

However, on some nodes if I perform the disk move on them even with c-max-rate set to 10MB/sec the network will drop.
Most likely the additional CEPH traffic is triggering the problem.

Code:

# pveversion
pve-manager/4.1-22/aca130cf (running kernel: 4.2.8-1-pve)

The nodes where I have seen this issue have Dual Intel(R) Xeon(R) CPU E5420 @ 2.50GHz

e100 · Apr 7, 2016

I think I discovered my issue.

Some of the DRBD resources were listening on the Infiniband and others on the Ethernet.
After get everything listening on on Infiniband all seems to work just fine.

Apparently I messed up when adding one of the nodes and forgot to specify the IP address of the IB network.

ikn · Apr 7, 2016

After a two month of use I downgrade cluster from version 4.1 to version 3.4.
3.4 works fine.

e100 · Apr 8, 2016

ikn said:
3.4 works fine

I'll not argue with that, as much as I hate to say it 4.x is not production ready yet.

I had another incident of the IB network interface failing. Again I found processes that were making a connection from the ethernet IP to another server's IPOIB IP address. Instead of DRBD it looked like it was VNC connection from another Proxmox node to a KVM process.

There is clearly something wrong with IPOIB on the newer 4.2 and 4.4 kernels.

fabian · Apr 8, 2016

Maybe you are affected by https://bugzilla.proxmox.com/show_bug.cgi?id=927 / https://bugzilla.kernel.org/show_bug.cgi?id=111921 ?

e100 · Apr 8, 2016

I've seen those bugs, the problem I am having does not result in just poor performance but complete network outage.
Could be the same bug but my symptoms seems to be a little different.

I downgraded the kernel to 4.2.8-37that Andrey reported as not having problems but it made no difference for me.

mir · Apr 8, 2016

The kernel guys are looking into this bug. Expect a patch next week.

e100 · Apr 8, 2016

When I was using only CEPH never had an issue, started setting up DRBD and have had nothing but network problems in the Infiniband.

I'd be happy to test a patched kernel once its available.

e100 · Apr 9, 2016

mir said:
The kernel guys are looking into this bug. Expect a patch next week.

I think I might be suffering from a different known bug that has not been fixed for my IB driver.

Back in 2014 the mlx4 driver was updated to use GFP_NOIO for QP creation when using connected mode and the IPoIB driver was updated to request GFP_NOIO from the hardware drivers.
This was to prevent a deadlock when using NFS on IPoIB and it was speculated that other things like iSCSI could possibly trigger a deadlock too.
https://lkml.org/lkml/2014/5/11/50

In Jan 2016 the qib driver was also updated with this fix:
http://comments.gmane.org/gmane.linux.drivers.rdma/32914

The driver for my IB cards has only been updated to recognize that GFP_NOIO was requested and report the following message:
ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL

My IB network had been stable until I started using DRBD, been running CEPH over it for months without issue.
I have had stability issues using KRBD but librbd works just fine.
KRBD involves kernel IO and networking thus fits in with the GFP_NOIO problem.

DRBD performs IO over the network so its quite possible that DRBD9 combined with IPoIB and a driver that is not using GFP_NOIO is triggering deadlocks.
There is plenty of evidence that this might be happening.
DRBD commands often stall for a long time.
IB interfaces stop passing any traffic
Network problems happen when there is little free RAM and high IO load on DRBD such a doing a full sync at 100MB/sec or restoring a VM from backup into DRBD storage.

I've been running DRBD8.3 and 8.4 on IPoIB for years on this same hardware without problems but DRBD9 has been nothing but problems.

Today I reconfigured DRBD to use an Ethernet IP address instead of the IB.
No more DRBD stability issues, no more IB networking issues, works as expected just slower.
Had three failed attempts to restore a VM into DRBD when using IPoIB, using ethernet it restored fine on the first try.

I doubt the Mellanox devs want to invest time fixing a driver for a legacy product, two years and no fix is a pretty good sign.
Any idea how I can get my driver patched? This is above my skill set.

mir · Apr 9, 2016

Have you considered replacing with ConnectX family nics? Development of mthca has stopped so you will likely never see the GFP_NOIO patch for this. On the other hand support is still active for any ConnectX family nic.

e100 · Apr 10, 2016

The expense of replacing them is the only thing stopping me.

If it's not a huge amount of money I could likely pay someone to patch mthca.

e100 · Apr 13, 2016

Attached is the GFP_NOIO patch I wrote for the mthca driver.

I wrote it against the latest pve-kernel source that uses the 4.4 kernel from ubuntu-xenial
I'll let it bake in my test cluster for a few days and report back if it resolved the issues or not.

ikn · Nov 15, 2016

Hi again.
I installed PVE 4.3 and latest updates.
Ceph over infiniband working fine
NFS periodically stalled. If I connect my NFS server over ethernet its working fine.

root@Node-204:~# uname -a
Linux Node-204 4.4.21-1-pve #1 SMP Thu Oct 20 14:56:39 CEST 2016 x86_64 GNU/Linux

mir · Nov 15, 2016

ikn said:
Hi again.
I installed PVE 4.3 and latest updates.
Ceph over infiniband working fine
NFS periodically stalled. If I connect my NFS server over ethernet its working fine.

root@Node-204:~# uname -a
Linux Node-204 4.4.21-1-pve #1 SMP Thu Oct 20 14:56:39 CEST 2016 x86_64 GNU/Linux

I have seen the same problems also with NFS over infiniband.

ikn · Nov 16, 2016

May be this problem is described here https://bugzilla.kernel.org/show_bug.cgi?id=111921
And solution is here http://marc.info/?l=linux-rdma&m=147620680520525&w=2

mir · Nov 16, 2016

Latest proxmox 4.4 kernel have this patch.

ikn · Nov 30, 2016

I updated nodes to the latest version of the kernel. This does not solve my problem.

root@Node-204:~# uname -a
Linux Node-204 4.4.24-1-pve #1 SMP Mon Nov 14 12:30:24 CET 2016 x86_64 GNU/Linux

After some time, NFS server is stalled. Usually after migration VM HDD about 20-100Gb
iperf works fine. I tested infiniband with iperf and i download and upload more than 10Tb without any problem

Any ideas?

e100 · Dec 1, 2016

I've had nothing but problems running drbd over infiniband in proxmox 4.x. CEPH works fine tho.

I got some connectX cards to see if that makes a difference but not had time to test them.

I'm hoping my workload will lighten up in January so I can start looking into the problems again.

mir · Dec 1, 2016

e100 said:
I got some connectX cards to see if that makes a difference but not had time to test them.

I am using connectX cards without any lock. Relucktantly I have come to the conclusion that Ubuntu's kernel 4.4 is broken regards infiniband and/or NFS.

Proxmox VE 4.1 Infiniband problem

Active Member

Renowned Member

Renowned Member

Active Member

Renowned Member

Proxmox Staff Member

Renowned Member

Famous Member

Renowned Member

Renowned Member

Famous Member

Renowned Member

Renowned Member

Attachments

Active Member

Famous Member

Active Member

Famous Member

Active Member

Renowned Member

Famous Member

We value your privacy