[TUTORIAL] Broadcom NICs down after PVE 8.2 (Kernel 6.8)

I am currently trying to follow the steps described by @SkepticNerdGuy who apparently has the same motherboard and the same problem.
According to the guide he linked I need to install the pve headers to get niccli to work - e.g.
Code:
apt install pve-headers-$(uname -r)

However, It seems like these headers don't exist in my case. My kernel is 6.8.4-2-pve, and when I try to run

Code:
apt install pve-headers-6.8.4-2-pve

I am being told no package was found..
 

Ive got the H12SSL-CT one which has the same onboard NIC and I dont have this issue as they dont use the bnxt_re but the bnxt_en driver. But I did try updating before installing the new kernel and it failed with the same error. What does lshw say about your NIC configuration?
Mine contains this: driver=bnxt_en driverversion=6.8.12-1-pve ... firmware=214.4.9.10/pkg 214.0.286.18

I am currently trying to follow the steps described by @SkepticNerdGuy who apparently has the same motherboard and the same problem.
According to the guide he linked I need to install the pve headers to get niccli to work - e.g.
Code:
apt install pve-headers-$(uname -r)

However, It seems like these headers don't exist in my case. My kernel is 6.8.4-2-pve, and when I try to run

Code:
apt install pve-headers-6.8.4-2-pve

I am being told no package was found..
Have you tried the 6.8.4-3 kernel?
 
@billy999 Thanks for trying to help me out, appreciate it.
In my case it shows:

Code:
configuration: autonegotiation=on broadcast=yes depth=32 driver=bnxt_en driverversion=6.8.4-2-pve duplex=full firmware=226.0.145.0/pkg 226.1.107.1 latency=0 link=yes mode=1280x1024 multicast=yes port=twisted pair speed=1Gbit/s visual=truecolor xres=1280 yres=1024
                configuration: autonegotiation=on broadcast=yes driver=bnxt_en driverversion=6.8.4-2-pve firmware=226.0.145.0/pkg 226.1.107.1 latency=0 link=no multicast=yes port=twisted pair

Regarding the kernel, is there even a newer one available? Or would I have to manually build and install it? There are no updates available for me.
 
@billy999 Thanks for trying to help me out, appreciate it.
In my case it shows:

Code:
configuration: autonegotiation=on broadcast=yes depth=32 driver=bnxt_en driverversion=6.8.4-2-pve duplex=full firmware=226.0.145.0/pkg 226.1.107.1 latency=0 link=yes mode=1280x1024 multicast=yes port=twisted pair speed=1Gbit/s visual=truecolor xres=1280 yres=1024
                configuration: autonegotiation=on broadcast=yes driver=bnxt_en driverversion=6.8.4-2-pve firmware=226.0.145.0/pkg 226.1.107.1 latency=0 link=no multicast=yes port=twisted pair

Regarding the kernel, is there even a newer one available? Or would I have to manually build and install it? There are no updates available for me.
Im on the no-sub repo, the newest is 6.8.12-1 on there.

The NIC issue doesnt seem to appear when the fw is much older like mine. Either way, the Thomas Krenn bnxtnvm tool doesnt for work flashing a new one, you should try using the official Broadcom niccli utility. I didnt try niccli since I planned to fix it retroactively but the issue never came up for me (I also had my server running on 6.8.4-2 for quite some time).
 
@billy999 Nice, I forgot about that. I switched to the no-sub repo and I am now also on 6.8.12-1. Also got the headers installed.
This means niccli now works, and it lists my NICs:

1724361936651.png

I will now attempt to do the firmware upgrade described by @SkepticNerdGuy and see how it goes...
 
@billy999 Okay, I flashed firmware 229.0.141.0, and the good news its - it works. The network interface of the proxmox system is now up and the vmbr0 gets the IP I configured, as to be expected.

However, I noticed that on reboot, proxmox still gets stuck for a good minute, and eventually systemd-networkd-wait-online throws an error. Here are the details:

1724363915882.png

Not sure what causes this or how to fix it, but at least the system now gets an IP..
 
@billy999 Okay, I flashed firmware 229.0.141.0, and the good news its - it works. The network interface of the proxmox system is now up and the vmbr0 gets the IP I configured, as to be expected.

However, I noticed that on reboot, proxmox still gets stuck for a good minute, and eventually systemd-networkd-wait-online throws an error. Here are the details:

View attachment 73479

Not sure what causes this or how to fix it, but at least the system now gets an IP..
Its not able to get an IP while booting, the service is there to wait for one. Usually happens when auto vmbr0 is removed from the config
 
I just noticed that Proxmox/Debian shouldnt have this service enabled by default. AFAIK only ubuntu does. Maybe check your systemd-analyze critical-chain
Did that, as you can see, yea, the systemd-networkd-wait-online took over 2 minutes...
1724369121054.png

Should I just disable the service, or do you have a better idea to troubleshoot this?
 
It seems like the issue is fixed now.
One of my hosts (Supermicro H13SSL-NT) is successfully running on kernel 6.8.12-1-pve since 18 August.

The onboard BCM57416 NICs are on this firmware version:
Code:
Active Package version      : 226.1.107.1
Package version on NVM      : 226.1.107.1
Firmware version            : 226.0.145.0

I only had to disable the bnxt_re kernel module to solve the "bnxt_re_is_fw_stalled" issue which happens during boot and delays the boot by about 100 seconds:
Code:
echo "blacklist bnxt_re" >> /etc/modprobe.d/blacklist-bnxt_re.conf
update-initramfs -u
 
Your Update and blacklist did indeed help.

unmasking the service in boot, as said by others, did brick the networking, as in I couldn't even restart networking, since it threw an error with "service is masked".

Niccli did not install correctly, with known errors from thomaskrenn. Even the normal FAQ fixes did not work, but this script with Updates (though it failed on the PCIe Card, updated the onboards and with blacklisting infiniband my server came back up, just slower, than normal.

Thank you very much!
 
  • Like
Reactions: jsterr
We had some issues with some broadcom nics going down after update to 6.8

Workaround: NICs go up if you do a service networking restart
FIX: Update Broadcom Firmware to latest firmware and blacklist their "beautiful" infiniband-driver

This will update ALL YOUR Broadcom-Network Cards to their latest firmware (live) (but reboot needed after it):



This is a snippet from our standards in our thomas-krenn pve ceph deployments. Theres also a snippet for blacklisting the infiniband driver:

Code:
echo "blacklist bnxt_re" >> /etc/modprobe.d/blacklist-bnxt_re.conf
update-initramfs -u

The Firmwareupdate needs a reboot to get active!
I tried but I got this:
2024-10-11 15:28:45 (1.11 MB/s) - ‘bnxtnvm.zip’ saved [1039248/1039248]

Archive: bnxtnvm.zip
inflating: bnxtnvm

Broadcom NetXtreme-C/E/S firmware update and configuration utility version v222.0.144.0

NetXtreme-E Controller #1 at PCI Domain:0000 Bus:12 Dev:00

This adapter is not supported for online firmware update.


Broadcom NetXtreme-C/E/S firmware update and configuration utility version v222.0.144.0

NetXtreme-E Controller #1 at PCI Domain:0000 Bus:5d Dev:00

This adapter is not supported for online firmware update.
 
I tried but I got this:
2024-10-11 15:28:45 (1.11 MB/s) - ‘bnxtnvm.zip’ saved [1039248/1039248]

Archive: bnxtnvm.zip
inflating: bnxtnvm

Broadcom NetXtreme-C/E/S firmware update and configuration utility version v222.0.144.0

NetXtreme-E Controller #1 at PCI Domain:0000 Bus:12 Dev:00

This adapter is not supported for online firmware update.


Broadcom NetXtreme-C/E/S firmware update and configuration utility version v222.0.144.0

NetXtreme-E Controller #1 at PCI Domain:0000 Bus:5d Dev:00

This adapter is not supported for online firmware update.
The blacklisting should be enough atm, if you cant upgraden. Seems like you have OEM-Firmware not regular Broadcom NIC?
 
Wellp, I'm stuck. Upgrading kernel and blacklisting didn't solve this for me, so I tried to do the firmware update, and it can't even do that

1731668065641.png


It's the supermicro H12DSI mobo and shows up as BCM57416, so it should be the same, but maybe I've bricked it now. niccli -i 1 show is now giving
1731668222053.png

and service networking restart doesn't bring the nics up on a clean boot. They were working in ubuntu server previously so I know they weren't just straight up broken.
 
Last edited:
Try to use the opt-in 6.11 kernel, in our case, the errors with the broadcom nics disappeared (at least no blacklisting needed anymore). Might be worth a try for you. You might need to fix your fw first, contact your hw vendor for that.
 
Gah! It turns out that this entire time, the issue was a marginal cable that was
- good enough to negotiate a stable 1G link
- good enough for the nic to try to negotiate a 10G link
- good enough to maintain a stable 10G link before I bumped it or something when switching to proxmox
- bad enough for the link to immediately die once established

niccli also apparently cannot fully communicate correctly with the onboard 10G nic on the supermicro H12DSI, because it still says "card is in a bad state" right now, but a full 10G link is working correctly now after replacing the cable.
 
Why not just go into the BIOS of the two onboard SuperMicro Broadcoms nics, under "Device Configuration Menu" and set "Support RDMA" to disabled. (Default is enabled on my board).
This should fix, non? And not require a blacklist?

Or failing that:

Disable RDMA​

To Disable RDMA, use the following commands:
./niccli -i <index> nvm -setoption support_rdma -scope 0 -value 0
./niccli -i <index> nvm -setoption support_rdma -scope 1 -value 0
https://techdocs.broadcom.com/us/en.../adapters/Tuning/dpdk-tunings/nic-tuning.html
 
Last edited:
Why not just go into the BIOS of the two onboard SuperMicro Broadcoms nics, under "Device Configuration Menu" and set "Support RDMA" to disabled. (Default is enabled on my board).
This should fix, non? And not require a blacklist?

Or failing that:

Disable RDMA​

To Disable RDMA, use the following commands:
./niccli -i <index> nvm -setoption support_rdma -scope 0 -value 0
./niccli -i <index> nvm -setoption support_rdma -scope 1 -value 0
https://techdocs.broadcom.com/us/en.../adapters/Tuning/dpdk-tunings/nic-tuning.html
I can confirm that setting "Support RDMA" to disabled for both Ethernet controllers on a H13SSL-NT mainboard solves the issue. No need to blacklist also.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!