PVE and Ceph on Mellanox infiniband parts, what is the current state of support?

alyarb

Well-Known Member
Feb 11, 2020
140
25
48
37
We are using PVE and Ceph in Dell blades using the M1000e modular chassis.

We are currently using dual mezzanine cards with 2x 10 GbE ports, one for Ceph front-end and one for Ceph back-end.

Public LANs, guest LANs, and Corosync are handled by 4x 10GbE cards on 40GbE MXL switches, so all is good there.

OSDs are 8TB intel P4510 NVMe/PCIe 3.0 drives, and I need to know the best way to build the ceph networks to get the most out of them.

As the Ceph front-end and back-end are each composed of 20 Gb LAGs, our i/o bottleneck is quite obvious and our benchmark testing pegs right to 2 GB/s. If I combine the ceph nets onto a single 40 Gb LAG, performance is slightly worse.


I just need to know, given the constraints of the M1000e, what is the best way to build the Ceph front/back end networks for the best guest VM performance? What is the general community experience and sentiment?

There are 56 Gb and 40 Gb infiniband options for our hardware that we are looking at to try and move the bottleneck as far out as we can.

Is infiniband supported on PVE/Ceph? Do you run them in pure infiniband mode, or ethernet mode, or IPoIB? Is RDMA working/do you use it? What kind of performance are you getting?

All the available NIC options are based on ConnectX-3 silicon. The FDR switch is the M4001F and the FDR10 switch is the M4001T, so apparently the FDR and FDR10 parts are not interchangeable. I am shooting for at least 56 Gb anyway.

All the NICs are dual-port NICs and there will be 4 switches in the chassis. How can I combine / aggregate / bond the links to effectively get 112 Gb?
 
56Gb IPoIB performance was atrocious, less than 10GbE. I ignored the advice from everyone and had to discover this for myself to believe it.

This thread remains for others to do the same: take it or leave it!
 
  • Like
Reactions: adrian0x0
56Gb IPoIB performance was atrocious, less than 10GbE. I ignored the advice from everyone and had to discover this for myself to believe it.

This thread remains for others to do the same: take it or leave it!
Hey thanks a lot, i was just planning on doing that myself. Good thing i posted!
Have you tried other things, such as EoIB or RDMA ?
 
Your application must provide native, robust, and ongoing support. Ceph is not one of those applications. RDMA was abandoned ages ago.
 
Bumping this up.
Anyone successfully using proxmox with Infiniband cards/switches ?
 
Bumping this up.
Anyone successfully using proxmox with Infiniband cards/switches ?

Yes,I'm use infiniband 56gbps card and switch
Work in IPoIB mode

Use CLI configure the IP over infiniband ,and use GUI configure ceph
It's work ,but bandwidth about 20Gbps

XGVCV2D~E9A$GVKBOH3D)CO.png
 
Well, 20Gbps is not that much... i mean, I use Mellanox Connect3 IB pci cards with Arista switch in Ethernet mode and I got 25Gbps :( - i'm sure I can fine tune that but it's enough for my needs.
 
Well, 20Gbps is not that much... i mean, I use Mellanox Connect3 IB pci cards with Arista switch in Ethernet mode and I got 25Gbps :( - i'm sure I can fine tune that but it's enough for my needs.

yep, IPoIB mode it's need CPU single core performance, the CPU E5-2682v4 have a lot of VMs load
If in brand new, the bandwidth test about 40Gbps IPoIB in 56 Gbps FDR infiniband
 
Last edited:
RDMA does not exist in Ceph. It is ethernet only.

There was an effort for it that was abandoned 6-7 years ago.

https://github.com/Mellanox/ceph/tree/luminous-12.1.0-rdma

i think, yes, but no. i use mellanox adapter to build an ipoib network (roce v1/v2 supported), the problem is how to config ceph to work with ipoib network, to support the rdma feature. maybe my knowlage is wrong, if so, please correct me...
 
Yes,I'm use infiniband 56gbps card and switch
Work in IPoIB mode

Use CLI configure the IP over infiniband ,and use GUI configure ceph
It's work ,but bandwidth about 20Gbps

View attachment 39813
Question out of curiosity if we do 3 nodes and use mellanox MC2207130 FDR cable directly attached port to port ie card to card without switch are we able to get the native speed of 40gbps?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!