10G NIC goes down every few seconds

bixnandy

New Member
May 6, 2024
6
0
1
Hello community!

I have a problem on one of my Proxmox hosts. The NIC to which my main network bridge is connected goes DOWN and UP again every few seconds.

The communication between the VMs is not interrupted. Only the communication between VMs and the “outside world”.

The problem does not occur constantly, but every few days. Even a restart did not always solve the problem.
To be honest, I don't know exactly how I fixed the error the last few times ...

At first, I thought it was a hardware problem. Then I put the bridge on the other 10G NIC. After a short time, the error occurred there too.

Now I have temporarily placed the bridge on one of the 1G NICs. For the time being, everything is working again, although more slowly of course.

Here are some brief details about the host:
It is a server ordered in this way with the following components:
- Mainboard H12DSi-NT6 with 2x10G Broadcom BCM57416 (these NICs are the ones affected by the error)
- 1x BCM95719A1904AC PCIe 4x 1G NICs (these work for the time being)

I don't think the rest of the data is important. I can provide it if needed.

This is a short section of the dmesg output:

Code:
[  890.223359] bnxt_en 0000:02:00.0 eno1np0: NIC Link is Down
[  890.225092] vmbr1: port 1(eno1np0) entered disabled state
[  892.723143] bnxt_en 0000:02:00.0 eno1np0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[  892.723152] bnxt_en 0000:02:00.0 eno1np0: EEE is not active
[  892.723154] bnxt_en 0000:02:00.0 eno1np0: FEC autoneg off encoding: None
[  892.723184] vmbr1: port 1(eno1np0) entered blocking state
[  892.723199] vmbr1: port 1(eno1np0) entered forwarding state
[  895.722180] bnxt_en 0000:02:00.0 eno1np0: NIC Link is Down
[  895.724029] vmbr1: port 1(eno1np0) entered disabled state
[  897.972793] bnxt_en 0000:02:00.0 eno1np0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[  897.972805] bnxt_en 0000:02:00.0 eno1np0: EEE is not active
[  897.972807] bnxt_en 0000:02:00.0 eno1np0: FEC autoneg off encoding: None
[  897.972855] vmbr1: port 1(eno1np0) entered blocking state
[  897.972875] vmbr1: port 1(eno1np0) entered forwarding state
[  900.231984] bnxt_en 0000:02:00.0 eno1np0: NIC Link is Down


This is my /etc/network/interfaces:

Code:
auto lo
iface lo inet loopback

auto eno1np0
iface eno1np0 inet manual
#zu Switch (funktioniert auch nicht)

auto eno2np1
iface eno2np1 inet manual
#defekt (doch nicht defekt)

auto enp161s0f1
iface enp161s0f1 inet manual

auto enp161s0f2
iface enp161s0f2 inet manual

auto enp161s0f3
iface enp161s0f3 inet manual

auto enp1s0f1
iface enp1s0f1 inet manual

auto enp1s0f2
iface enp1s0f2 inet manual

auto enp1s0f3
iface enp1s0f3 inet manual
#zu Modem

auto enp161s0f0
iface enp161s0f0 inet manual

iface enxbe3af2b6059f inet manual

auto enp1s0f0
iface enp1s0f0 inet manual

auto vmbr0
iface vmbr0 inet manual
        bridge-ports enp1s0f3
        bridge-stp off
        bridge-fd 0
#Extern (Zu Modem)

auto vmbr1
iface vmbr1 inet static
        address 10.10.10.111/16
        gateway 10.10.10.254
        bridge-ports eno1np0 enp1s0f2
        bridge-stp off
        bridge-fd 0
#Intern

eno1np0 and eno2np1 are the 10G NICs that make problems.
This is the current configuration with the 1G NICs that works.

I'm at my wit's end.
Does anyone have any ideas on how I can find and solve the problem?

Many, many thanks
Andreas
 
Set bridge STP = on or create a bond
Hi floh8,

unfortunately, setting STP to on did not solve the problem for me.

I will try to create a bond!

Code:
[142277.141822] bnxt_en 0000:02:00.1 eno2np1: NIC Link is Down
[142277.143656] vmbr1: port 1(eno2np1) entered disabled state
[142279.642551] bnxt_en 0000:02:00.1 eno2np1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[142279.642562] bnxt_en 0000:02:00.1 eno2np1: EEE is not active
[142279.642565] bnxt_en 0000:02:00.1 eno2np1: FEC autoneg off encoding: None
[142279.642615] vmbr1: port 1(eno2np1) entered blocking state
[142279.642639] vmbr1: port 1(eno2np1) entered listening state
[142281.678645] vmbr1: port 1(eno2np1) entered learning state
[142283.726569] vmbr1: port 1(eno2np1) entered forwarding state
[142283.726595] vmbr1: topology change detected, sending tcn bpdu
[142283.726666] fwbr30001i0: port 1(fwln30001i0) received tcn bpdu
[142283.726669] fwbr30001i0: topology change detected, propagating
[142288.171059] bnxt_en 0000:02:00.1 eno2np1: NIC Link is Down
 
Just tried to add a bond in between the NIC and the bridge. I hope that is how you menat it?

unfortunately that does not solve my problem either...

/etc/network/interfaces:
Code:
auto lo
iface lo inet loopback

auto eno1np0
iface eno1np0 inet manual
#zu Modem

auto eno2np1
iface eno2np1 inet manual
#zu Switch

auto enp1s0f1
iface enp1s0f1 inet manual

auto enp1s0f2
iface enp1s0f2 inet manual

auto enp1s0f3
iface enp1s0f3 inet manual
#zu Switch temporär

iface enxbe3af2b6059f inet manual

auto enp1s0f0
iface enp1s0f0 inet static
        address 10.10.10.123/16
#Management

auto bond0
iface bond0 inet manual
        bond-slaves eno1np0
        bond-miimon 100
        bond-mode balance-rr
#bond_extern

auto bond1
iface bond1 inet manual
        bond-slaves eno2np1
        bond-miimon 100
        bond-mode balance-rr
#bond_intern

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp on
#Extern

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond1
        bridge-stp on
#Intern

dmesg:
Code:
[158143.983771] bnxt_en 0000:02:00.1 eno2np1: NIC Link is Down
[158144.050414] bond1: (slave eno2np1): link status definitely down, disabling slave
[158144.050429] bond1: now running without any active interface!
[158144.050725] vmbr1: port 1(bond1) entered disabled state
[158146.484347] bnxt_en 0000:02:00.1 eno2np1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[158146.484356] bnxt_en 0000:02:00.1 eno2np1: EEE is not active
[158146.484359] bnxt_en 0000:02:00.1 eno2np1: FEC autoneg off encoding: None
[158146.546308] bond1: (slave eno2np1): link status definitely up, 10000 Mbps full duplex
[158146.546321] bond1: active interface up!
[158146.546338] vmbr1: port 1(bond1) entered blocking state
[158146.546343] vmbr1: port 1(bond1) entered listening state
[158148.594180] vmbr1: port 1(bond1) entered learning state
[158150.642103] vmbr1: port 1(bond1) entered forwarding state
[158150.642126] vmbr1: topology change detected, sending tcn bpdu
[158150.642223] fwbr30001i0: port 1(fwln30001i0) received tcn bpdu
[158150.642228] fwbr30001i0: topology change detected, propagating
[158154.242982] bnxt_en 0000:02:00.1 eno2np1: NIC Link is Down

Any other ideas?
 
Bonds are based on 2 vor more net ports. Both Connecticut to the same switch.
 
OK, I think there is a misunderstanding, or more like miscommunication from my side.

My goal is not any kind of redundancy or teaming. I just want one 10G NIC that works.

In the very first interfaces file, there were two NICs on the "internal" bridge.
That is because I have put the 1G NIC on the bridge as a temporary fix and did not remove the 10G one that did not work.
Of course, you assumed that I wanted to have those two NICs on the bridge - my bad...

Then I thought that using a bond would be some kind of work around and tried it with my one NIC that I want to use.

For clarification, what I would like to have is just one 10G NIC on each of my two bridges.
And that does not work ATM because the 10G NIC goes down every few seconds when it starts to fail again (also if only the 10G NIC is configured on this bridge as it should be). From what I can tell by now, the starting to fail is random, every some hours to some days.
 
According to your post and this wiki-entry (https://www.thomas-krenn.com/de/wiki/Known_Issues_Proxmox_VE_8.2), there are some strange things going on with kernel 6.8 and proxmox 8.2 together with Broadcom NICs.

I just ordered an Intel 10G NIC for now. Maybe the problem will be resolved with some updates. For now, it seems to me that buying a PCIe NIC is just the more economical solution…

Thanks to everyone contributing anyway!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!