Mellanox Connect-X 5 weird performance issues over MPLS

lamcnamara

New Member
Mar 24, 2024
1
0
1
I feel like I need to give a bit of background here.

We have two locations, one in Atlanta and one in Dallas. We have an MPLS connection between the two sites and redundant internet connection to separate firewalls and separate switches in both locations. The main focus is the MPLS connection is a 1gbps connection between each location connecting the switches.

We've also got a mixture of Dell hardware in each location. Some relatively new C6520 Blades, some R750, and some slightly older R440s. The C6520 and R750 were purchased at the same time and use the OCP Connect-X 5 card, we also purchased some PCIe Connect-X 5 Cards and added them to the R440s. Most of these machines are running Proxmox, some are just NFS storage though.

We were doing some initial testing with MySQL backing up to the Texas site and noticed pretty terrible performance performance using the R440s in Texas as host. We verified this with an Iperf between both sites on the hypervisor layer. When the R750 or C6520 are the host or client they max out our 1gbps connection across the MPLS and stay at 1gbps for the length of the test (5-10 minutes).

When we try this same test using the R440s as server for iperf, the performance is 1gbps at first but quickly drops to 400mbps then to 200mbps where it hovers for the duration of the test. We originally were running the tests in Ubuntu on the machines but decided to test from proxmox itself and found the same issue.

I guess my first question is, we want to upgrade the firmware on these R440s but and Nvidia has some documentation on the process but the firmware software wants to upgrade some libraries, obviously there is some concern there. Has anyone had experience with upgrading the firmware upgrade utility in proxmox? and should we be concerned with upgrading and while this isn't a production environment I'd hate to brick this machine and have harden this whole machine because we just upgraded one library.

In some initial research i see others have mentioned that iperf is single threaded, to I'll take a look at running a different version of iperf that is multi threaded. Really just looking for advice on this one as I haven't run into this issue on any of our newer machines and not sure where to turn. Nvidia has been particularly useless as they require an enterprise support contract to even answer simple questions.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!