Terrible networking performance with mlx5_core connectx4 mellanox

wickleighter

New Member
Jan 19, 2024
6
2
3
Hi all,

Trying to evaluate proxmox install with iperf3 and struggling to understand the networking configuration.

I've installed the current release 8.1.4 and have a connectx4-lx pcie card installed with the latest firmware and OFED drivers (mlx5_core).

I have no VMs, no firewall, so should just be raw performance:

I installed iperf3 daemon on proxmox host (10.10.10.2) , and try testing against this node:

Code:
❯ iperf3 -c 10.10.10.2
Connecting to host 10.10.10.2, port 5201
[  5] local 172.22.178.82 port 38858 connected to 10.10.10.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.69 GBytes  23.1 Gbits/sec    0   3.84 MBytes
[  5]   1.00-2.00   sec  2.72 GBytes  23.3 Gbits/sec    0   3.84 MBytes
[  5]   2.00-3.00   sec  2.66 GBytes  22.8 Gbits/sec    0   3.84 MBytes
[  5]   3.00-4.00   sec  2.65 GBytes  22.8 Gbits/sec    0   3.84 MBytes
[  5]   4.00-5.00   sec  2.71 GBytes  23.3 Gbits/sec    0   3.84 MBytes
[  5]   5.00-6.00   sec  2.74 GBytes  23.5 Gbits/sec    0   3.84 MBytes
[  5]   6.00-7.00   sec  2.62 GBytes  22.5 Gbits/sec    0   3.84 MBytes
[  5]   7.00-8.00   sec  2.70 GBytes  23.2 Gbits/sec    0   3.84 MBytes
[  5]   8.00-9.00   sec  2.74 GBytes  23.5 Gbits/sec    0   4.03 MBytes
[  5]   9.00-10.00  sec  2.74 GBytes  23.5 Gbits/sec    0   4.03 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  27.0 GBytes  23.2 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  27.0 GBytes  23.2 Gbits/sec                  receiver

OK, looks sane, but asking the proxmox host to SEND via --reverse is baffling:

Code:
❯ iperf3 -c 10.10.10.2 -R
Connecting to host 10.10.10.2, port 5201
Reverse mode, remote host 10.10.10.2 is sending
[  5] local 172.22.178.82 port 54488 connected to 10.10.10.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   232 MBytes  1.94 Gbits/sec
[  5]   1.00-2.00   sec   231 MBytes  1.94 Gbits/sec
[  5]   2.00-3.00   sec   233 MBytes  1.96 Gbits/sec
[  5]   3.00-4.00   sec   228 MBytes  1.91 Gbits/sec
[  5]   4.00-5.00   sec   227 MBytes  1.90 Gbits/sec
[  5]   5.00-6.00   sec   225 MBytes  1.88 Gbits/sec
[  5]   6.00-7.00   sec   225 MBytes  1.89 Gbits/sec
[  5]   7.00-8.00   sec   226 MBytes  1.89 Gbits/sec
[  5]   8.00-9.00   sec   225 MBytes  1.89 Gbits/sec
[  5]   9.00-10.00  sec   226 MBytes  1.90 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.23 GBytes  1.91 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.22 GBytes  1.91 Gbits/sec                  receiver


Am I missing something here?
 
Have you checked whether x2apic is turned on for your NIC? We had cases where fast NICs got bottlenecked by interrupt handling - leading to bad performance. You should have an option in your BIOS / EFI where you can enable x2apic.
 
Thanks for your reply- what I see from dmesg regarding APIC:

Code:
root@pve:~# dmesg | grep APIC
[    0.003692] ACPI: APIC 0x00000000B51DB000 00015E (v05 ALASKA A M I    01072009 AMI  00010013)
[    0.003703] ACPI: Reserving APIC table memory at [mem 0xb51db000-0xb51db15d]
[    0.026272] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[    0.026283] IOAPIC[0]: apic_id 32, version 33, address 0xfec00000, GSI 0-23
[    0.026286] IOAPIC[1]: apic_id 33, version 33, address 0xfec01000, GSI 24-55
[    0.068683] APIC: Switch to symmetric I/O mode setup
[    0.069426] x2apic: IRQ remapping doesn't support X2APIC mode
[    0.069429] Switched APIC routing to physical flat.
[    0.281856] ACPI: Using IOAPIC for interrupt routing
[    0.337951] AMD-Vi: Extended features (0x246577efa2254afa, 0x0): PPR NX GT [5] IA GA PC GA_vAPIC
[    0.419049] AMD-Vi: Virtual APIC enabled

It seems pxe does not support x2apic if I understand this correctly? It seems to fall back to some virtual APIC. I will try toggling what I can in BIOS.
 
Can you give me the full output of the boot logs (or at least grep with -i flag?) Otherwise it's hard to tell for sure.

Nevertheless it seems like you are not directly connecting to the Proxmox Host:
Code:
[  5] local 172.22.178.82 port 54488 connected to 10.10.10.2 port 5201

Is there a host in between? Can you tell me a bit more about the exact setup you are using to test (where is the test-host/proxmox-host, how are they connected, what's the maximum possible speed of each involved piece of hardware (NIC, switch, ...))
 
Thanks for your reply-

The IP was different as it was run from WSL on direct-attached node, here is the same test from host (with firewall disabled, same connectx4 latest drivers/fw):

Code:
Connecting to host 10.10.10.2, port 5201
[  5] local 10.10.10.32 port 58817 connected to 10.10.10.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec  2.53 GBytes  21.6 Gbits/sec
[  5]   1.01-2.01   sec  2.42 GBytes  20.8 Gbits/sec
[  5]   2.01-3.01   sec  2.34 GBytes  20.0 Gbits/sec
[  5]   3.01-4.00   sec  2.28 GBytes  19.8 Gbits/sec
[  5]   4.00-5.01   sec  2.62 GBytes  22.3 Gbits/sec
[  5]   5.01-6.01   sec  2.67 GBytes  22.8 Gbits/sec
[  5]   6.01-7.00   sec  2.72 GBytes  23.6 Gbits/sec
[  5]   7.00-8.01   sec  2.78 GBytes  23.7 Gbits/sec
[  5]   8.01-9.00   sec  2.74 GBytes  23.7 Gbits/sec
[  5]   9.00-10.01  sec  2.65 GBytes  22.6 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  25.8 GBytes  22.1 Gbits/sec                  sender
[  5]   0.00-10.01  sec  25.7 GBytes  22.1 Gbits/sec                  receiver

and reverse:
Code:
Connecting to host 10.10.10.2, port 5201
Reverse mode, remote host 10.10.10.2 is sending
[  5] local 10.10.10.32 port 58854 connected to 10.10.10.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec   452 MBytes  3.75 Gbits/sec
[  5]   1.01-2.00   sec   445 MBytes  3.77 Gbits/sec
[  5]   2.00-3.01   sec   460 MBytes  3.83 Gbits/sec
[  5]   3.01-4.01   sec   465 MBytes  3.88 Gbits/sec
[  5]   4.01-5.00   sec   457 MBytes  3.87 Gbits/sec
[  5]   5.00-6.01   sec   457 MBytes  3.81 Gbits/sec
[  5]   6.01-7.00   sec   459 MBytes  3.89 Gbits/sec
[  5]   7.00-8.01   sec   467 MBytes  3.89 Gbits/sec
[  5]   8.01-9.01   sec   458 MBytes  3.82 Gbits/sec
[  5]   9.01-10.00  sec   436 MBytes  3.70 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.46 GBytes  3.83 Gbits/sec  2131             sender
[  5]   0.00-10.00  sec  4.45 GBytes  3.82 Gbits/sec                  receiver

case-insensitive dmesg result was the same, full boot log attached
 

Attachments

There seem to be quite a few retries in the second output (they might've been masked due to running in WSL before). How are the machines connected? If not directly, is there any possibility you could try with both machines directly connected?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!