Dear community,
I am reaching out because I am at my wits end. Since some days, ALL my NICS constantly go up and down, and when they do come back, it is as 100Mb, with no connectivity.
The ONLY thing that helps is manually (via console) re-running:
Unfortunately, I can not say EXACTLY when it started, but I started investigating because I was getting connectivity cuts / degraded performance.
I have the following hardware:
Normally I use the the onboard NICS in bond0 as vmbr0, and the PCI nic in bond1 as vmbr1.
For the sake of testing, I decided to ONLY focus on the onboard NICS ( and I removed the card from the server)
cat /etc/network/interfaces
Normal config
Sample logs:
Things I have tried / tested:
The ONLY thing that works for some time, is manually executing:
OK, So at this point I get what you are thinking, the Mobo is broken...
Fair point. But then I did the following test:
So I am going completely crazy. It must be Proxmox right?
A new NIC is arriving later today to retest (intel 350T v4)
Would anybody care to take a stab at this issue?
Many thanks in advance
I am reaching out because I am at my wits end. Since some days, ALL my NICS constantly go up and down, and when they do come back, it is as 100Mb, with no connectivity.
The ONLY thing that helps is manually (via console) re-running:
ethtool -s eno1np0 speed 1000 duplex full autoneg on
ethtool -s eno2np1 speed 1000 duplex full autoneg on
Unfortunately, I can not say EXACTLY when it started, but I started investigating because I was getting connectivity cuts / degraded performance.
I have the following hardware:
Base Board Information
Manufacturer: Supermicro
Product Name: H12SSL-CT
Version: 1.02
BIOS: (latest available)
Revision H12SS-(i)(C)(CT)(NT)_3.3_AS1.05.02_SAA1.2.0-p
BIOS Revision: 3.3
BMC Firmware Revision: 1.05.02
CPU: AMD EPYC 7313P 16-Core Processor
Onboard NIC:
Subsystem: Super Micro Computer Inc BCM57416 NetXtreme-E [15d9:16d8]
Product Name: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter
Part number: BCM957416A4160
PCI NIC:
HP NC365T network card - Intel 82580 Gigabit Ethernet Controller / 4 × RJ45 (1GbE)
Normally I use the the onboard NICS in bond0 as vmbr0, and the PCI nic in bond1 as vmbr1.
For the sake of testing, I decided to ONLY focus on the onboard NICS ( and I removed the card from the server)
cat /etc/network/interfaces
auto eno1np0
iface eno1np0 inet manual
pre-up ethtool -s eno1np0 speed 1000 duplex full
auto eno2np1
iface eno2np1 inet manual
pre-up ethtool -s eno2np1 speed 1000 duplex full
auto bond0
iface bond0 inet manual
bond-slaves eno1np0 eno2np1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
bond-lacp-rate fast
bond-updelay 200
bond-downdelay 200
auto vmbr0
iface vmbr0 inet static
address 10.10.10.254/24
gateway 10.10.10.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
Normal config
ip link show bond0
Ethernet Channel Bonding Driver: v6.14.5-1-bpo12-pve
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200
Peer Notification Delay (ms): 0
802.3ad info
LACP active: on
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 3c:ec:ef:9a:23:0e
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 9
Partner Key: 1002
Partner Mac Address: 70:a7:41:68:11:e0
Slave Interface: eno1np0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 7
Permanent HW addr: 3c:ec:ef:9a:23:0e
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: 3c:ec:ef:9a:23:0e
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 32768
system mac address: 70:a7:41:68:11:e0
oper key: 1002
port priority: 1
port number: 21
port state: 61
Slave Interface: eno2np1
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 6
Permanent HW addr: 3c:ec:ef:9a:23:0f
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 2
details actor lacp pdu:
system priority: 65535
system mac address: 3c:ec:ef:9a:23:0e
port key: 7
port priority: 255
port number: 2
port state: 71
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
=======================
ethtool output :
Settings for eno1np0:
Supported ports: [ TP ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: RS BASER
Advertised link modes: 1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 12
Transceiver: internal
MDI-X: Unknown
Supports Wake-on: g
Wake-on: d
Current message level: 0x00002081 (8321)
drv tx_err hw
Link detected: yes
root@serverbox:~# ethtool eno2np1
Settings for eno2np1:
Supported ports: [ TP ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: RS BASER
Advertised link modes: Not reported
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 100Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 13
Transceiver: internal
MDI-X: Unknown
Supports Wake-on: g
Wake-on: d
Current message level: 0x00002081 (8321)
drv tx_err hw
Link detected: yes
Sample logs:
Jun 04 07:55:01 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: NIC Link is Down
Jun 04 07:55:01 serverbox kernel: bond0: (slave eno1np0): speed changed to 0 on port 1
Jun 04 07:55:01 serverbox kernel: bond0: (slave eno1np0): link status definitely down, disabling slave
Jun 04 07:55:01 serverbox kernel: bond0: active interface up!
Jun 04 07:55:04 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: NIC Link is Up, 100 Mbps full duplex, Flow control: none
Jun 04 07:55:04 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: EEE is not active
Jun 04 07:55:04 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: FEC autoneg off encoding: None
Jun 04 07:55:04 serverbox kernel: bond0: (slave eno1np0): invalid new link 3 on slave
Jun 04 07:55:04 serverbox kernel: bond0: (slave eno1np0): link status definitely up, 100 Mbps full duplex
Jun 04 07:55:09 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: NIC Link is Down
Jun 04 07:55:09 serverbox kernel: bond0: (slave eno1np0): speed changed to 0 on port 1
Jun 04 07:55:09 serverbox kernel: bond0: (slave eno1np0): link status definitely down, disabling slave
Jun 04 07:55:11 serverbox kernel: bnxt_en 0000:46:00.1 eno2np1: NIC Link is Down
Jun 04 07:55:11 serverbox kernel: bond0: (slave eno2np1): speed changed to 0 on port 2
Jun 04 07:55:11 serverbox kernel: bond0: (slave eno2np1): link status definitely down, disabling slave
Jun 04 07:55:11 serverbox kernel: bond0: now running without any active interface!
Jun 04 07:55:11 serverbox kernel: vmbr0: port 1(bond0) entered disabled state
Jun 04 07:55:12 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
Jun 04 07:55:12 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: EEE is not active
Jun 04 07:55:12 serverbox kernel: bnxt_en 0000:46:00.0 eno1np0: FEC autoneg off encoding: None
Jun 04 07:55:12 serverbox kernel: bond0: (slave eno1np0): link status up, enabling it in 200 ms
Jun 04 07:55:12 serverbox kernel: bond0: (slave eno1np0): invalid new link 3 on slave
Jun 04 07:55:12 serverbox kernel: bond0: (slave eno1np0): link status definitely up, 1000 Mbps full duplex
Jun 04 07:55:12 serverbox kernel: bond0: active interface up!
Things I have tried / tested:
- Moved the bond from the UNIFI switch towards a Cisco 3650 compact switch : same problem
- Replaced all cabling with brand new CAT6 cabling : same problem
- Disabled all energy saving related settings in BIOS : no luck
- Removed the bonding and only used one interface (eno1np0) : problem still remains.
- Tried with different kernels (6.14.5-1-bpo12-pve, 6.8.12-11-pve and 6.5 : problem persists
- All variations of ethtool commands
- ethtool -K eno2np1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off
- ethtool -s eno1np0 advertise 0x020
- it happens with Onboard NIC, PCI NIC, and I even added an extra NIC to test (AOC-STG-i2t ): all the same
The ONLY thing that works for some time, is manually executing:
ethtool -s eno1np0 speed 1000 duplex full autoneg on
OK, So at this point I get what you are thinking, the Mobo is broken...
Fair point. But then I did the following test:
- Mounted a debian Lived Iso (latest from the site)
- Ran it, and used 1 interface: eno1np0 ---> link would stay up INDEFINITELY, while performing continuous Iperf testing / load
- Redid the test by creating the bond on the debian: ---> link would stay up INDEFINITELY, while performing continuous Iperf testing / load
So I am going completely crazy. It must be Proxmox right?
A new NIC is arriving later today to retest (intel 350T v4)
Would anybody care to take a stab at this issue?
Many thanks in advance
Last edited: