Problem with Broadcom NIC on PVE 9

ToeiRei

Active Member
Jun 7, 2020
1
0
41
44
Hey guys,

due to the known problems with PVE9 on Intel NICs I checked my server and thought that I was in the clear to upgrade from 8 to 9 for I have Broadcom NICs:

Code:
root@pve:/lib/firmware# lspci | grep -i net
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
01:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
01:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
root@pve:/lib/firmware#

Pretty much nothing in the logs that points to a problem. So I looked at a test machine:

Code:
root@pve:~# lspci | grep -i net
46:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
46:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)

also, weird performance drops. Turning offloading on or off didn't do anything at all and I'm currently out of ideas as sticking with kernel 6.8 does fix the linux bridges, but that's not a permanent solution.

Update on some tinkering:
I migrated on the test box (RDMA Cards) from Linux Bridges to OVS which seems to get me around the problem - BUT: iperf throws like 20k repeats. That is not normal behavior. I mean, "Normal" is a loaded term when Broadcom firmare is involved. It's like calling a ticking bomb "a charming decorative piece."
As I am used to tinker with my own kernels, I pulled /lib/firmware from the official git and could get the error count on iperf from several thousands down to 6:

Code:
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.10.10, port 33378
[  5] local 192.168.10.2 port 5201 connected to 192.168.10.10 port 33386
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   117 MBytes   978 Mbits/sec    0    367 KBytes
[  5]   1.00-2.00   sec   116 MBytes   969 Mbits/sec    3    367 KBytes
[  5]   2.00-3.00   sec   115 MBytes   966 Mbits/sec    0    367 KBytes
[  5]   3.00-4.00   sec   115 MBytes   967 Mbits/sec    0    367 KBytes
[  5]   4.00-5.00   sec   116 MBytes   972 Mbits/sec    1    367 KBytes
[  5]   5.00-6.00   sec   115 MBytes   966 Mbits/sec    1    367 KBytes
[  5]   6.00-7.00   sec   116 MBytes   969 Mbits/sec    0    367 KBytes
[  5]   7.00-8.00   sec   115 MBytes   967 Mbits/sec    0    411 KBytes
[  5]   8.00-9.00   sec   115 MBytes   968 Mbits/sec    0    411 KBytes
[  5]   9.00-10.00  sec   115 MBytes   966 Mbits/sec    1    411 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.13 GBytes   969 Mbits/sec    6            sender
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------

Weirdly enough, this seems to be a constant on the test system, reliably reproduceable on the test system. Live system on the other hand completely choked and the best I could get it to do is run PVE9 with Linux Bridges on the old 6.8 kernel....


Changes:
- clarification that I have no Intel NIC but broadcom
- firmware, iperf
- Production system addition
 
Last edited: